By · Last updated 2026-04-02

Back to BlogHealthcare

LLMs Miss 50% of Clinical PHI

A 2025 study found LLMs miss more than 50% of clinical PHI in multilingual documents. 34.8% of all ChatGPT inputs contain sensitive data.

April 2, 20269 minute read
LLM PHI detectionHIPAA de-identificationclinical NLPSafe Harbor methodhealthcare AI compliance

The 50% Miss Rate Problem

A 2025 survey (arXiv:2509.14464) tested LLM tools on clinical records. The results were bad. These tools missed more than 50% of clinical PHI in multilingual documents. The cause is simple. LLMs are built for text output. They are not built for the high-recall detection task that HIPAA demands.

HIPAA Safe Harbor lists 18 protected identifier types. Names, dates, phone numbers, SSNs, MRNs, health plan IDs, device IDs, and IP addresses. Each needs its own detection logic.

Clinical notes make this harder. Take this example: "Pt. John D., DOB 4/12/67, MRN 1234567, admitted 03/15/24, Dr. Smith ordered ECG." One sentence. Five protected identifiers. Most use short forms. A model built for clinical meaning often fails the detection task.

What LLMs Miss and Why

LLM tools fail on clinical records in set ways.

Short-form identifiers: Clinical notes use shorthand. DOB, MRN, and Pt. are common forms. A model tuned for clinical meaning may not flag "Pt. John D." as a name. Sensitive data extraction needs a different goal.

Context-dependent dates: Not all dates pose the same risk. "Age 67" is a soft marker. "DOB 4/12/67" is a direct protected identifier. "03/15/24" as an admit date is protected too. Pattern matching alone is not enough.

Non-US formats: Cyberhaven (Q4 2025) found that 34.8% of all ChatGPT inputs contain sensitive data, including multilingual PII. In healthcare, this means non-US record IDs, regional date formats, and local health ID types. US-trained tools miss these consistently.

Custom hospital identifiers: Hospitals use their own MRN formats, staff IDs, and site codes. These are not in standard NER training data. A tool with no custom entity support will not find them.

The Research Dataset Risk

A hospital building a research dataset from 500,000 notes faces a real compliance problem. HIPAA calls for a "very small risk" standard on de-identified data. A tool missing half of all protected identifiers cannot meet that bar.

Research archives are not clean data. Notes span many departments, time periods, and sometimes languages. A tool that works on billing data may fail on narrative notes. Sensitive data in free text has no field label.

IRB approval adds more demands. Institutions must show the method used, the identifier types removed, and the checks done. A tool missing half of all records cannot meet those demands.

See our compliance overview and security practices for how anonym.legal supports HIPAA work.

The Three-Layer Fix

The 2025 survey found one clear pattern. The tools with the lowest miss rates used three detection layers.

Layer one — regex: Finds structured identifiers. SSNs, MRNs, phone numbers, health plan IDs. Reliable on fixed formats.

Layer two — NER: Uses transformer models. Finds names, dates, and sensitive data in narrative text. Works where regex cannot.

Layer three — custom entities: Handles site-specific forms. Proprietary MRN patterns, staff IDs, facility codes. No standard model covers these.

Pure ML tools degrade on short forms and non-English text. Pure regex tools miss sensitive data with no field label. Neither alone is enough.

Only the three-layer design reached sub-5% miss rates in the survey. That is the bar for HIPAA Safe Harbor compliance.

See our guide on HIPAA Safe Harbor de-identification for research for next steps.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.