By · Last updated 2026-03-15

Back to BlogLegal Tech

Permanent Anonymization: Spoliation Risk

34.8% of ChatGPT inputs contain sensitive data (Cyberhaven). The fix — permanent anonymization — creates its own legal risk: spoliation. GDPR Art.

March 15, 202610 minute read
reversible encryptionspoliation risklegal discovery complianceGDPR pseudonymizationAES-256-GCM

Updated for 2026

One Fix, Two New Risks

Many firms now block AI leaks by stripping out names and IDs before text reaches an AI provider. One-way hashing, hard redaction, or full removal all seem safe. The AI gets clean text. Sensitive details stay in-house.

The logic holds on the security side. Cyberhaven's Q4 2025 study found that 34.8% of content sent to ChatGPT holds sensitive data. Ponemon's 2024 report put the average AI breach cost at $2.1 million. The risk is real and the cost is high.

But full removal trades one risk for another: spoliation of evidence.

For firms subject to lawsuits or audits, destroying the ability to restore raw records can count as spoliation under federal and state rules.

The AI Sharing Scale

Research from eSecurity Planet and Cyberhaven found that 77% of staff share sensitive data with AI tools each week. This spans legal, healthcare, finance, and tech.

Shared content often includes:

  • Client letters and case notes
  • Draft contracts and deal terms
  • Internal plans and business records
  • Financial models and projections
  • Legal memos and case notes
  • Patient records and clinical notes
  • HR files and staff messages

When full removal is the AI control, every document that passes through it may lose its legal value. If those documents surface in a lawsuit — very likely over any multi-year period for firms in regulated fields — the firm has potentially lost evidence.

See our legal alignment overview for how anonym.legal meets discovery duties. You can also review the token system guide to see how the masking pipeline works in practice.

GDPR: Reversibility Is Required

GDPR Article 4(5) defines pseudonymization as processing personal records in a way that means they "can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately."

The key point: the extra key that enables re-linking must be kept. Records that can be re-linked via stored keys count as pseudonymized under GDPR.

Records that cannot be re-linked at all are not pseudonymized. They are anonymized. The gap matters:

  • Token-masked records keep some GDPR duties but can be restored for legal use.
  • Fully wiped records may fall outside GDPR scope but cannot be restored at all.

The European Data Protection Board's Guidelines 05/2022 confirm that reversibility is a core part of the definition. Firms using one-way removal are not doing GDPR pseudonymization. They are cutting the ability to recover records.

Learn more at our conformance hub and protection overview.

Federal Rules: The Spoliation Test

Under the Federal Rules of Civil Procedure, parties must preserve records that may be relevant to expected legal action. This duty starts when a lawsuit is reasonably foreseeable — not when it is filed.

Rule 37(e) lets courts impose penalties when a party fails to preserve stored records. Penalties can include:

  • Adverse inference instructions
  • Evidence preclusion
  • Case-ending sanctions in serious cases

Here is how this plays out. A firm uses AI workflows that fully remove sensitive content in the normal course of business. Those records later become relevant to a lawsuit. The firm has altered them so the raw text cannot be restored. If that occurred after the duty to preserve attached, spoliation exposure follows.

This is not a fringe case. Firms in regulated fields with recurring legal exposure face constant foreseeable lawsuits across broad document types. Deploying full removal across all workflows — without carve-outs for at-risk records — creates large spoliation risk.

Reversible vs. Irreversible: Key Difference

The difference between reversible and one-way masking is in the design.

One-Way: no way back

SHA-256 hashing of a name produces a fixed hash. The name cannot be derived from it. Hard redaction removes text so the raw content is gone.

Reversible: recovery is possible

Token substitution with key retention and AES-256-GCM encryption both transform records in ways that can be undone. A name replaced with a token can be restored via a lookup table. AES-256-GCM content can be decrypted with the right key. The raw text stays reachable.

For AI protection, both methods work the same way. The AI processes tokens and never sees the real records.

For legal duty, only reversible token masking works. One-way methods cut off recovery and create the spoliation risk noted above.

Read how our token system handles this end to end. For deeper context, see the glossary and FAQ.

The Dual-Compliant Design

A design that meets both AI security and legal disclosure duties uses reversible AES-256-GCM token masking:

  1. Records are processed before they reach any AI tool.
  2. Sensitive items — names, IDs, PHI, privileged content — are swapped for structured tokens.
  3. The token map is kept in a separate store with access controls that match the data type.
  4. AI processing runs on the token copy. The AI never sees the real records.
  5. Results are restored using the token map for normal business use.
  6. The token map is placed under legal hold when discovery duties attach.

Under this design, no raw content is ever lost. The AI provider never sees it in usable form. The token map keeps recovery possible when the law requires it. Spoliation risk is gone — no records are destroyed. They are only masked in a way that can be undone.

GDPR Article 4(5) is met: the extra key (token map) is kept apart with the right technical and process safeguards. The Federal Rules preservation duty is met: raw records can be restored when a legal hold applies.

Explore our entity detection approach, protection overview, and plans and rates for full details.

The Binary Choice

Firms face a clear fork:

  • Permanently remove data — solve the AI leak problem but create legal risk.
  • Use reversible token masking — meet both protection and conformance needs at once.

The $2.1 million average AI breach cost drives the security decision. But spoliation sanctions are not cheap either. In cases with large monetary stakes, costs can reach the same order of magnitude. Both risks deserve a place in the decision.

A sound AI policy covers both ends. It blocks sensitive records from leaving the firm in usable form. And it keeps those same records reachable when a court or regulator asks for them. Reversible token masking is the only method that does both at once.

For more background, see our founder statement and case studies.

Sources

  • Cyberhaven Q4 2025: Data Exposure in AI Tools — link
  • IBM / Ponemon Institute: Cost of a Data Breach Report 2024 — link
  • EDPB Guidelines 05/2022 on Pseudonymization — link
  • Federal Rules of Civil Procedure Rule 37(e) — link
  • E-Discovery LLC: Relevance Redactions and Legal Standards — link

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.