By · Last updated 2026-05-29

Back to BlogLegal Tech

The PDF Redaction Trap: Data Exposed

The DOJ Epstein files, the Manafort case, and NSA leaks all share the same failure: cosmetic redaction that leaves underlying text extractable.

May 29, 20268 minute read
PDF redactionlegal redactioncourt filingFOIAdocument security

This guide was updated for 2026.

When a court filing says "REDACTED," people assume the hidden text is gone. Sometimes it is not. Anyone can copy-paste a blacked-out passage and read it in seconds. That gap has a name: cosmetic redaction. It has caused real damage.

Three cases prove the risk is not hypothetical.

DOJ Epstein files (December 2025). Court documents were filed with black bars over sensitive names. The text underneath was readable by copy-paste. Journalists found this within hours. The names that prosecutors argued should stay sealed were exposed.

Paul Manafort case (January 2019). Defense attorneys filed Mueller documents using Microsoft Word's highlight function. That tool draws a black bar but leaves the words intact. A simple paste revealed everything. The court was not pleased.

NSA leaks (multiple years). Decades of PDF releases have contained extractable text. Journalists and researchers caught this repeatedly. The Intelligence Community Oversight Board issued formal guidance on this exact failure mode.

The pattern is the same every time. Someone applies a visual bar. They submit the file. The hidden text surfaces. Sometimes within hours. Sometimes years later.

Why Black Bars Alone Fail

A PDF has three distinct layers.

The content layer stores all the characters, coordinates, and fonts. Copy-paste and extraction tools read from here. The display layer holds visual instructions. This includes shapes, colors, images, and the black rectangles used as overlay bars. The metadata layer stores file properties like author name, timestamps, and revision history.

A cosmetic bar lives in the display layer only. The content layer underneath is untouched. Select All → Copy → Paste returns every word. That includes the words "hidden" by the bar.

Tools That Produce Only Visual Bars

Some common tools only paint over the text. They do not remove it.

Adobe Acrobat drawing tools. Drawing a rectangle is not the same as using the Redact function. The rectangle is visual only.

Microsoft Word track changes. Deleted passages persist in version history even after acceptance. The history is still readable.

Browser PDF annotators. These add a black highlight. They do not modify the underlying data.

Image overlays on scanned pages. Safe only if the original text layer was stripped first. Without that step, the stored text stays intact.

What Real Redaction Requires

Genuine redaction removes information from the content layer. The display layer then has nothing to show. You confirm success by extracting the text from the saved file. You check that the target passage is absent.

Court filing units and intelligence agencies follow this check:

  1. Use a tool that modifies the content layer. Do not use a tool that paints over it.
  2. Export to a new PDF.
  3. Open the new file in a clean viewer. Use a viewer with no link to the original.
  4. Select All → Copy → Paste into a plain text editor.
  5. Search for any fragment of the hidden passage.
  6. Found it? The file is not truly processed. Start over with the right tool.
  7. Not found? Proceed to the metadata check.

Step five is the critical test. Visual overlays fail it every time. A correctly processed file passes it.

The Metadata Problem

The content layer is not the only leak path. File metadata can expose a lot.

Author name. Often the attorney or case manager who made the document.

Organization. The law firm or agency name.

Earlier versions. These show the document before any changes were made.

Revision history. Tracked changes and comments are stored here.

Embedded thumbnails. These can show the document in its original, unprocessed state.

The NSA's guidance document states this directly. "Redacting with confidence requires that the metadata is also controlled."

For court filings, this is a real problem. A document filed on behalf of an anonymous party may carry metadata naming the real author. A blacked-out version may carry a thumbnail of the original. Proper tools sanitize metadata as part of the process. Visual overlay tools do not touch it.

The consequences depend on context. The precedent is not good for anyone using visual-only overlays.

Federal courts. Rule 5.2(e) of the Federal Rules of Civil Procedure requires filed documents to have specific identifiers removed. Courts have imposed fines, filing bans, and bar referrals for failures here.

FOIA disputes. Agencies that apply visual overlays over exempt information can still have that information extracted. Courts have ordered genuine disclosure in such cases.

National security. Personnel named through leaked files face documented security risks. The exposure goes beyond embarrassment.

GDPR and HIPAA. Extractable personal data is a reportable breach. GDPR Article 33 and the HIPAA Breach Notification Rule both apply.

A Five-Minute Pre-Filing Check

This checklist removes visual-overlay risk entirely. It takes under five minutes per document.

  1. Use a content-layer tool. Do not use a drawing or annotation tool.
  2. Export to a new PDF. Do not overwrite the original.
  3. Open the new file in a fresh viewer.
  4. Select All → Copy → Paste into a plain text editor.
  5. Search for a known phrase from the hidden passage.
  6. Found it? Start over with the correct tool.
  7. Check PDF properties: Author, Creator, Subject, Keywords.
  8. Check for embedded thumbnails showing the document before processing.
  9. File the verified document.

Five minutes here costs far less than defending a failed-redaction motion before a federal judge.

Related: The Epstein Files Redaction Failure Explained — a full breakdown of the December 2025 incident.

See also: AI Coding Assistants and PII Leakage in Production — a different leak path, the same lesson.

anonym.legal provides automated text-layer verification for organizations that handle sensitive filings.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.