By · Last updated 2026-05-29

Back to BlogTechnical

Why Binary PII Detection Fails Compliance

Detected/not-detected is insufficient for compliance contexts that require human judgment. Here's why confidence scoring transforms PII anonymization from.

May 29, 20268 minute read
confidence scoringPII detectionlegal discoverycomplianceGDPR audit

title: "Why Binary PII Detection Fails Compliance" description: "Detected/not-detected flags cannot support defensible redaction decisions. Confidence scoring transforms PII anonymization from a binary guess into an auditable compliance control." category: technical publishedAt: 2026-06-21 tags:

  • confidence scoring
  • PII detection
  • legal discovery
  • compliance
  • GDPR audit readingTime: 8

Why Binary PII Detection Fails Compliance

Updated for 2026

Every PII tool faces one hard problem. The same string can be personal data in one place and not in another.

"John" in a customer file is a data subject. "John" in a history paper about John F. Kennedy is not. A nine-digit number in a medical record is a HIPAA code. The same nine digits in a product code are not.

A yes/no flag cannot handle this. It forces two bad choices: redact all strings that might be PII, or redact only sure matches. Both fail in law, where every decision must be clear and documented.

A per-entity score from 0 to 100 offers a third path. It drives tiered rules, human review queues, and full audit records.

The Limit of Yes/No Flags

Context changes the meaning of data. Two files can hold the same string. In one, it is personal data. In the other, it is not. A flag cannot show that. A number can.

With only a flag, your two options are bad. Over-redaction kills document value. Under-redaction creates legal risk. Neither holds up in court.

Legal discovery has rules that make scored detection a must.

The over-redaction problem. Redacting attorney names or court citations damages the evidence. Courts have fined attorneys for over-redaction. The same case law that covers under-redaction covers this too.

The under-redaction problem. Missing real PII creates risk. That includes client privacy breaches, bar complaints, and in some places, criminal charges.

The need to explain each call. When a court asks why an item was redacted, attorneys must explain it. "The tool flagged it" is not enough. "The tool scored this at 94% as a Social Security Number. Our rule auto-redacts above 85%." That is enough.

A yes/no flag cannot give that answer. A scored tool with set rules can. See also: Defending Redactions: AI Scores in Court.

A Three-Tier Review System

The most effective setup uses three tiers based on the entity score.

Tier 1 — Auto (above 85%):

  • Items that match high-certainty formats (SSN, IBAN, MRN)
  • Auto-redacted with no human step
  • Log records entity type, score, method, and time
  • Example: "571-44-9283" at 97% as SSN — auto-redacted

Tier 2 — Human review (50–85%):

  • Items that may be PII but need a judgment call
  • Sent to a reviewer to accept, reject, or reclassify
  • Log records entity type, score, reviewer ID, decision, and time
  • Example: "John Davis" in a tech doc at 67% — reviewer confirms it is a name — redacted

Tier 3 — Suggestion only (below 50%):

  • Low-certainty items shown as tips
  • Not auto-redacted; reviewer may act or skip
  • Log records entity type, score, and reviewer choice
  • Example: "Smith" in a product doc at 42% — reviewer finds it is a firm name — not redacted

Only Tier 2 needs human work. All three tiers produce audit records.

How Scores Are Built

PII tools combine signals to produce one number per entity.

Regex patterns. An exact SSN-format match gets a high base score. A partial match gets a lower one.

Model output. Named entity models assign a probability per class. A score of 0.93 for PERSON gives a high-certainty result.

Context signals. Text around the entity adjusts the score. "My SSN is 571-44-9283" raises it. "Product code 571-44-9283" lowers it.

Ensemble rules. Systems combine regex, model, and context signals with set weights. The final number reflects all the evidence.

That number drives every threshold decision in your workflow. For more on false positives from yes/no tools, see: The False Positive Tax on PII Tools.

Insurance Claims: A Real Example

Insurance files mix clear PII — policyholder name, address, SSN — with context-dependent data: witness names, firm names, adjuster signatures.

A yes/no tool either redacts all names (wrong for firms) or misses witness names (a risk). A scored tool handles each item on its own:

  • SSN with label "policyholder SSN" at 96% — auto-redacted
  • Policyholder name tagged PERSON at 91% — auto-redacted
  • Contractor firm tagged ORG at 78% — reviewed — reviewer rejects redaction
  • Witness name tagged PERSON at 82% — reviewed — reviewer accepts
  • Adjuster name tagged PERSON at 71% — reviewed — reviewer accepts (third-party data)

Each call has a numeric basis. The audit trail is full.

Building Compliance Records

For GDPR Article 5(1)(f) and the HIPAA Security Rule, scored tools generate records on their own.

Entity-level audit records capture entity type, score, decision type (auto or manual), reviewer ID, and time. These export as CSV for data authority inquiries.

Threshold records document current settings and every change. Each change includes who made it, when, and why. This shows a managed, deliberate policy.

Stats reports cover detection rates by entity type, Tier 2 review rates, and override rates. They answer a data authority asking to "show us your controls."

For HIPAA audit trail guidance, see: Explainable Redaction: HIPAA Audits.

A yes/no flag is a guess. A score is evidence.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.