By · Last updated 2026-05-29

Back to BlogLegal Tech

Legal PII: Privilege Detection

Case reference numbers, bar admission numbers, court docket numbers, and client matter IDs are legally sensitive identifiers that standard PII tools miss.

May 29, 20267 minute read
attorney-client privilegelegal document reviewcase numberslaw firm privacylegal tech

title: "Legal PII: Privilege Detection" description: "Case reference numbers, bar admission numbers, court docket numbers, and client matter IDs are legally sensitive identifiers that standard PII tools miss." category: legal-tech publishedAt: 2026-06-03 tags:

  • attorney-client privilege
  • legal document review
  • case numbers
  • law firm privacy
  • legal tech readingTime: 7

Standard PII tools catch names, emails, and SSNs. They miss case reference IDs, bar admission numbers, and client matter tags. These carry serious privilege risks. Generic tools leave that gap open.

Law firms send files to AI tools every day. Those files contain privilege-sensitive markers that standard tools do not catch.

When a law firm routes files through an AI assistant, those files contain legal IDs alongside standard PII:

  • Client matter tags: Link to the full matter file and name the client
  • Case reference IDs: Court-assigned codes that tie to public records with private detail
  • Bar admission numbers: Attorney IDs searchable in public state directories
  • Court docket codes: Connect to public filing systems with full case history
  • Judicial assignment codes: Identify the presiding judge in sensitive situations

Any of these, sent to an external AI vendor, creates a potential privilege problem.

Why These IDs Need Custom Detection

Court docket formats follow district-level patterns. No single pattern covers all federal and state courts.

Federal civil cases use a two-digit year, then "cv," then a case number. Criminal cases use "cr" in the same spot. State courts vary by region with no shared standard.

Bar admission numbers are state-specific. California uses a numeric format. New York uses a registry format. Texas uses its own bar ID format. No national format exists.

Client matter tags are firm-specific. Each firm builds its own format. Year-client-matter. Practice group codes. Sequential IDs.

Standard PII tools cannot know any of these without custom setup.

The gap is real. A document tool receives full matter context. Docket codes link to public records. Client tags are present. The tool reports PII removed. Names and emails were removed. The privilege-sensitive IDs were not.

A legal AI startup builds a document tool for law firms. The product scans discovery files, spots relevant clauses, and flags potentially privileged content. Enterprise clients require redaction of client matter tags alongside standard PII before processing.

The compliance blocker: the AI tool processes file data containing client matter tags. Combined with public court filings, those tags could allow matter identification. Enterprise legal ops teams flag this as unacceptable.

Before custom entity detection:

  • Deal review finds the compliance gap
  • 3+ month engineering queue for a custom NLP model
  • Enterprise contract on hold

With a custom entity API:

  • Compliance officer defines the matter tag format at onboarding
  • Pattern tested against sample files: 2 days
  • Custom entity added to the pipeline: 1 more day
  • Enterprise contract proceeds

The gap is 3 days versus 3+ months. The work is pattern setup and API integration. No NLP model training required.

Common Formats by Category

Federal court dockets:

Federal civil cases use: two-digit year + "cv" + a 4–6 digit case number. Example: 24-cv-12345. Criminal cases use "cr" in the same spot. Bankruptcy cases use "bk." Appeals use a two-digit year and a 4–5 digit number that varies by circuit.

State court formats (examples):

California Superior Court uses a six-digit prefix system. New York uses an index format with year and sequence. Texas uses a cause format with year, sequence, and court code.

Client matter tags (typical firm formats):

Three common patterns appear across most firms:

  • Two-digit year, client ID, matter sequence (e.g., 24-ACME-001)
  • Practice group initials, year, then a four-digit sequence (e.g., LIT240042)
  • Client prefix with a six-digit ID (e.g., SMITHCO-000123)

US bar admission IDs:

Most states use 4–8 digit numbers, sometimes with a state-level prefix. USDC admission IDs vary by district and do not follow a shared format.

Privilege-Aware Processing Pipeline

For document review AI, a layered pipeline handles the full scope.

Layer 1 — Standard PII detection

Names, emails, phone numbers, addresses, SSNs. High accuracy. Well-established tooling handles this layer well.

Layer 2 — Custom code detection

Matter codes, docket IDs, bar IDs. Firm-specific patterns set at onboarding. This layer fills the gap that standard tools miss.

Layer 3 — Privilege review (human)

After automated detection, an attorney reviews flagged markers. ATTORNEY-CLIENT headers. WORK PRODUCT labels. CONFIDENTIAL markings. Human review at this layer is not optional.

Layer 4 — Context exception review

Public record dockets that pose no privilege risk versus client matter tags that do. This needs attorney judgment. It cannot be automated.

Layers 1 and 2 handle high-volume work. Layers 3 and 4 keep attorney judgment where privilege decisions belong. For what happens when privilege is already waived by AI tool use, see attorney-client privilege and AI.

Setup for Developers

Onboarding configuration

Collect client matter tag formats during enterprise onboarding. Each firm uses a different format. Store them as firm-specific custom entities. Apply to all processing for that account.

Default presets

Pre-built presets cover common contexts without custom work:

  • "Federal Court Documents" — federal docket patterns for civil, criminal, and bankruptcy
  • "State Court Documents (CA/NY/TX)" — state-specific formats for three major jurisdictions
  • "Internal Operations" — matter tag plus standard PII
  • "Outside Counsel Portal" — bill reference, matter tag, and standard PII

Audit documentation

Processing records should show that custom codes were included in each detection pass. This supports work product protection for the analysis method.

For a broader look at how redaction costs scale in litigation, see e-discovery PII automation and legal review cost reduction.

Conclusion

Privilege-sensitive IDs are as risky as standard PII — often more so. Tools that miss docket codes and matter tags leave a real gap in document workflows.

The fix is not an NLP model. It is pattern setup. For developers building law firm tools, that is the difference between a 3-day fix and a 3-month project. For law firms, it is the difference between defensible AI-assisted review and a privilege waiver risk.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.