title: "Legal PII: Privilege Detection" description: "Case reference numbers, bar admission numbers, court docket numbers, and client matter IDs are legally sensitive identifiers that standard PII tools miss." category: legal-tech publishedAt: 2026-06-03 tags:
- attorney-client privilege
- legal document review
- case numbers
- law firm privacy
- legal tech readingTime: 7
Attorney-Client Privilege in the AI Era: Legal PII Your Anonymization Tool Must Detect
Standard PII tools catch names, emails, and SSNs. They miss case reference IDs, bar admission numbers, and client matter tags. These carry serious privilege risks. Generic tools leave that gap open.
Law firms send files to AI tools every day. Those files contain privilege-sensitive markers that standard tools do not catch.
When a law firm routes files through an AI assistant, those files contain legal IDs alongside standard PII:
- Client matter tags: Link to the full matter file and name the client
- Case reference IDs: Court-assigned codes that tie to public records with private detail
- Bar admission numbers: Attorney IDs searchable in public state directories
- Court docket codes: Connect to public filing systems with full case history
- Judicial assignment codes: Identify the presiding judge in sensitive situations
Any of these, sent to an external AI vendor, creates a potential privilege problem.
Why These IDs Need Custom Detection
Court docket formats follow district-level patterns. No single pattern covers all federal and state courts.
Federal civil cases use a two-digit year, then "cv," then a case number. Criminal cases use "cr" in the same spot. State courts vary by region with no shared standard.
Bar admission numbers are state-specific. California uses a numeric format. New York uses a registry format. Texas uses its own bar ID format. No national format exists.
Client matter tags are firm-specific. Each firm builds its own format. Year-client-matter. Practice group codes. Sequential IDs.
Standard PII tools cannot know any of these without custom setup.
The gap is real. A document tool receives full matter context. Docket codes link to public records. Client tags are present. The tool reports PII removed. Names and emails were removed. The privilege-sensitive IDs were not.
The Legal AI Startup Case
A legal AI startup builds a document tool for law firms. The product scans discovery files, spots relevant clauses, and flags potentially privileged content. Enterprise clients require redaction of client matter tags alongside standard PII before processing.
The compliance blocker: the AI tool processes file data containing client matter tags. Combined with public court filings, those tags could allow matter identification. Enterprise legal ops teams flag this as unacceptable.
Before custom entity detection:
- Deal review finds the compliance gap
- 3+ month engineering queue for a custom NLP model
- Enterprise contract on hold
With a custom entity API:
- Compliance officer defines the matter tag format at onboarding
- Pattern tested against sample files: 2 days
- Custom entity added to the pipeline: 1 more day
- Enterprise contract proceeds
The gap is 3 days versus 3+ months. The work is pattern setup and API integration. No NLP model training required.
Common Formats by Category
Federal court dockets:
Federal civil cases use: two-digit year + "cv" + a 4–6 digit case number. Example: 24-cv-12345. Criminal cases use "cr" in the same spot. Bankruptcy cases use "bk." Appeals use a two-digit year and a 4–5 digit number that varies by circuit.
State court formats (examples):
California Superior Court uses a six-digit prefix system. New York uses an index format with year and sequence. Texas uses a cause format with year, sequence, and court code.
Client matter tags (typical firm formats):
Three common patterns appear across most firms:
- Two-digit year, client ID, matter sequence (e.g., 24-ACME-001)
- Practice group initials, year, then a four-digit sequence (e.g., LIT240042)
- Client prefix with a six-digit ID (e.g., SMITHCO-000123)
US bar admission IDs:
Most states use 4–8 digit numbers, sometimes with a state-level prefix. USDC admission IDs vary by district and do not follow a shared format.
Privilege-Aware Processing Pipeline
For document review AI, a layered pipeline handles the full scope.
Layer 1 — Standard PII detection
Names, emails, phone numbers, addresses, SSNs. High accuracy. Well-established tooling handles this layer well.
Layer 2 — Custom code detection
Matter codes, docket IDs, bar IDs. Firm-specific patterns set at onboarding. This layer fills the gap that standard tools miss.
Layer 3 — Privilege review (human)
After automated detection, an attorney reviews flagged markers. ATTORNEY-CLIENT headers. WORK PRODUCT labels. CONFIDENTIAL markings. Human review at this layer is not optional.
Layer 4 — Context exception review
Public record dockets that pose no privilege risk versus client matter tags that do. This needs attorney judgment. It cannot be automated.
Layers 1 and 2 handle high-volume work. Layers 3 and 4 keep attorney judgment where privilege decisions belong. For what happens when privilege is already waived by AI tool use, see attorney-client privilege and AI.
Setup for Developers
Onboarding configuration
Collect client matter tag formats during enterprise onboarding. Each firm uses a different format. Store them as firm-specific custom entities. Apply to all processing for that account.
Default presets
Pre-built presets cover common contexts without custom work:
- "Federal Court Documents" — federal docket patterns for civil, criminal, and bankruptcy
- "State Court Documents (CA/NY/TX)" — state-specific formats for three major jurisdictions
- "Internal Operations" — matter tag plus standard PII
- "Outside Counsel Portal" — bill reference, matter tag, and standard PII
Audit documentation
Processing records should show that custom codes were included in each detection pass. This supports work product protection for the analysis method.
For a broader look at how redaction costs scale in litigation, see e-discovery PII automation and legal review cost reduction.
Conclusion
Privilege-sensitive IDs are as risky as standard PII — often more so. Tools that miss docket codes and matter tags leave a real gap in document workflows.
The fix is not an NLP model. It is pattern setup. For developers building law firm tools, that is the difference between a 3-day fix and a 3-month project. For law firms, it is the difference between defensible AI-assisted review and a privilege waiver risk.