Updated for 2026
One Fix, Two New Risks
Many firms now block AI leaks by stripping out names and IDs before text reaches an AI provider. One-way hashing, hard redaction, or full removal all seem safe. The AI gets clean text. Sensitive details stay in-house.
The logic holds on the security side. Cyberhaven's Q4 2025 study found that 34.8% of content sent to ChatGPT holds sensitive data. Ponemon's 2024 report put the average AI breach cost at $2.1 million. The risk is real and the cost is high.
But full removal trades one risk for another: spoliation of evidence.
For firms subject to lawsuits or audits, destroying the ability to restore raw records can count as spoliation under federal and state rules.
The AI Sharing Scale
Research from eSecurity Planet and Cyberhaven found that 77% of staff share sensitive data with AI tools each week. This spans legal, healthcare, finance, and tech.
Shared content often includes:
- Client letters and case notes
- Draft contracts and deal terms
- Internal plans and business records
- Financial models and projections
- Legal memos and case notes
- Patient records and clinical notes
- HR files and staff messages
When full removal is the AI control, every document that passes through it may lose its legal value. If those documents surface in a lawsuit — very likely over any multi-year period for firms in regulated fields — the firm has potentially lost evidence.
See our legal alignment overview for how anonym.legal meets discovery duties. You can also review the token system guide to see how the masking pipeline works in practice.
GDPR: Reversibility Is Required
GDPR Article 4(5) defines pseudonymization as processing personal records in a way that means they "can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately."
The key point: the extra key that enables re-linking must be kept. Records that can be re-linked via stored keys count as pseudonymized under GDPR.
Records that cannot be re-linked at all are not pseudonymized. They are anonymized. The gap matters:
- Token-masked records keep some GDPR duties but can be restored for legal use.
- Fully wiped records may fall outside GDPR scope but cannot be restored at all.
The European Data Protection Board's Guidelines 05/2022 confirm that reversibility is a core part of the definition. Firms using one-way removal are not doing GDPR pseudonymization. They are cutting the ability to recover records.
Learn more at our conformance hub and protection overview.
Federal Rules: The Spoliation Test
Under the Federal Rules of Civil Procedure, parties must preserve records that may be relevant to expected legal action. This duty starts when a lawsuit is reasonably foreseeable — not when it is filed.
Rule 37(e) lets courts impose penalties when a party fails to preserve stored records. Penalties can include:
- Adverse inference instructions
- Evidence preclusion
- Case-ending sanctions in serious cases
Here is how this plays out. A firm uses AI workflows that fully remove sensitive content in the normal course of business. Those records later become relevant to a lawsuit. The firm has altered them so the raw text cannot be restored. If that occurred after the duty to preserve attached, spoliation exposure follows.
This is not a fringe case. Firms in regulated fields with recurring legal exposure face constant foreseeable lawsuits across broad document types. Deploying full removal across all workflows — without carve-outs for at-risk records — creates large spoliation risk.
Reversible vs. Irreversible: Key Difference
The difference between reversible and one-way masking is in the design.
One-Way: no way back
SHA-256 hashing of a name produces a fixed hash. The name cannot be derived from it. Hard redaction removes text so the raw content is gone.
Reversible: recovery is possible
Token substitution with key retention and AES-256-GCM encryption both transform records in ways that can be undone. A name replaced with a token can be restored via a lookup table. AES-256-GCM content can be decrypted with the right key. The raw text stays reachable.
For AI protection, both methods work the same way. The AI processes tokens and never sees the real records.
For legal duty, only reversible token masking works. One-way methods cut off recovery and create the spoliation risk noted above.
Read how our token system handles this end to end. For deeper context, see the glossary and FAQ.
The Dual-Compliant Design
A design that meets both AI security and legal disclosure duties uses reversible AES-256-GCM token masking:
- Records are processed before they reach any AI tool.
- Sensitive items — names, IDs, PHI, privileged content — are swapped for structured tokens.
- The token map is kept in a separate store with access controls that match the data type.
- AI processing runs on the token copy. The AI never sees the real records.
- Results are restored using the token map for normal business use.
- The token map is placed under legal hold when discovery duties attach.
Under this design, no raw content is ever lost. The AI provider never sees it in usable form. The token map keeps recovery possible when the law requires it. Spoliation risk is gone — no records are destroyed. They are only masked in a way that can be undone.
GDPR Article 4(5) is met: the extra key (token map) is kept apart with the right technical and process safeguards. The Federal Rules preservation duty is met: raw records can be restored when a legal hold applies.
Explore our entity detection approach, protection overview, and plans and rates for full details.
The Binary Choice
Firms face a clear fork:
- Permanently remove data — solve the AI leak problem but create legal risk.
- Use reversible token masking — meet both protection and conformance needs at once.
The $2.1 million average AI breach cost drives the security decision. But spoliation sanctions are not cheap either. In cases with large monetary stakes, costs can reach the same order of magnitude. Both risks deserve a place in the decision.
A sound AI policy covers both ends. It blocks sensitive records from leaving the firm in usable form. And it keeps those same records reachable when a court or regulator asks for them. Reversible token masking is the only method that does both at once.
For more background, see our founder statement and case studies.