Every document you create carries invisible data โ author names, company names, edit timestamps, revision history, GPS coordinates, software fingerprints. Before sharing files externally, a thorough sanitization protects your privacy and your organization's security. Here's what to check.
1. Author and creator metadata
Every Office document and PDF stores the author's name, the organization name, and the username of the person who last saved it. This information is embedded in the file and not visible when you open it normally. A recipient can extract it with any metadata viewer.
Risk: Revealing employee names, internal usernames, or the company that originally created a template you're using.
Fix: DocInspector Sanitize removes Author, Creator, Producer, and all identity fields from PDF, DOCX, XLSX, CSV, TXT, Images, and PPTX files.
2. Revision history and tracked changes
Microsoft Word documents store every tracked change, comment, and edit history by default. Even if "Accept All Changes" was clicked, some versions keep older edit records in the file structure.
Risk: Revealing internal negotiations, legal strategy, pricing decisions, or HR discussions that were edited out of the final version.
Fix: DocInspector Sanitize strips revision history and tracked changes from DOCX files.
3. Hidden comments and annotations
PDF annotations, Word comments, and Excel cell notes are often invisible in print view but fully readable in the file. A simple right-click or metadata tool reveals them all.
Risk: Internal reviewer comments, legal notes, pricing notes, or personal observations visible to external recipients.
Fix: DocInspector removes all annotations and comments during the Sanitize operation.
4. GPS coordinates in images
Photos taken with smartphones embed GPS coordinates (latitude, longitude, altitude) in the EXIF metadata. If you embed a phone photo in a contract, report, or PDF, the exact location where the photo was taken is visible to anyone who inspects the file.
Risk: Revealing the home address of a employee, the location of a private meeting, or sensitive site coordinates.
Fix: DocInspector Sanitize strips GPS and all EXIF data from JPG, PNG, TIFF, and embedded images in PDF/DOCX/XLSX/PPTX files.
5. Creation date and modification timestamps
Documents carry creation date, last modification date, and last print date. In legal and compliance contexts, revealing that a document was modified after a certain date can be significant evidence of tampering.
Risk: Exposing when a document was last edited, which may contradict stated timelines or reveal undisclosed modifications.
Fix: DocInspector sanitization removes all embedded date fields from document metadata.
6. Software and version fingerprints
The "Producer" and "Creator" fields in PDF metadata reveal what software was used โ "Microsoft Word 16.0.14326", "Adobe Acrobat Pro 2020", "LibreOffice 7.2". This information can be used for targeted attacks (knowing which software version is in use) or reveals internal tooling preferences.
Risk: Competitive intelligence leak, security vulnerability targeting.
Fix: DocInspector removes all software fingerprint fields during Sanitize.
7. Embedded objects and linked content
Office documents can contain embedded objects (OLE objects), linked external files, or macros. These carry their own metadata and can reference internal network paths, share names, or usernames from the original machine.
Risk: Revealing internal network topology, share paths (\FILESERVER\HR\Confidential\...), or hidden executable content.
Fix: DocInspector Sanitize removes embedded metadata references. For maximum security, use Flatten to Image PDF โ converts the entire document to a raster image, eliminating all structured content including embedded objects.
Quick sanitization workflow with DocInspector
- Upload all files (PDF, DOCX, XLSX, images, PPTX) โ up to 100 at once
- Select Sanitize Metadata (removes all 7 items above)
- Optionally add Flatten to Image PDF for maximum security
- Click Run โ all files processed in one batch
- Download as ZIP
Frequently asked questions
Does sanitizing affect the visual content of the document?
No. Sanitization removes only hidden metadata โ the visual content (text, images, formatting) remains completely unchanged.
Should I sanitize before or after adding a watermark?
Sanitize first, then watermark. This ensures the watermark itself doesn't carry residual metadata from the sanitized content.
Does this work on scanned PDFs?
Yes โ scanned PDFs still carry file-level metadata (author, dates, software) which DocInspector removes. Image-embedded EXIF in scans is also stripped.
Conclusion
Document sanitization is a critical step before any external sharing, especially in legal, compliance, HR, and financial contexts. DocInspector automates all 7 sanitization steps in a single batch operation โ one click for 100 files.