18 HIPAA Identifiers Your Tool Misses
Updated for 2026.
HIPAA lists 18 PHI identifier categories. Most anonymization tools detect maybe six. The other twelve slip through — and each one is a compliance gap.
The Safe Harbor Rule
HIPAA's Privacy Rule (45 CFR § 164.514) defines Safe Harbor de-identification. All 18 identifier categories must go. Remove every one and the data is de-identified by law. This is why Safe Harbor is popular: it is pass or fail, not a judgment call.
The 18 categories are:
- Names
- Geographic data smaller than state — street address, city, county, ZIP code
- Dates except year — birth, admission, discharge, death
- Phone figures
- Fax figures
- Email addresses
- Social Security codes
- Medical record identifiers (MRNs)
- Health plan beneficiary codes
- Account identifiers
- Certificate and license codes
- Vehicle identifiers and serial codes
- Device identifiers and serial codes
- Web URLs
- IP addresses
- Biometric identifiers — fingerprints, voiceprints
- Full-face photos and similar images
- Any other unique identifying code or value
Most tools handle categories 1, 4, 6, and 7 well. They miss 8, 9, 10, 11, 13, and 18 routinely.
The MRN Gap
Medical record identifiers sit at category 8. MRN formats are set by each hospital. There is no US national standard.
Hospital A uses a 7-digit integer. Hospital B uses "PT-YYYYNNNN." Hospital C uses an 8-character alphanumeric string. Hospital D writes "MRN: " before a 9-digit code.
A generic tool will not flag "PT-2024-8847" as PHI. The document passes de-identification checks. But it is not de-identified. No alert fires. The team thinks the job is done. It is not.
This is the worst kind of gap: a silent one.
Three Ways to Fix It
Code it in Presidio. This needs Python skills and ongoing upkeep. It works but costs time.
Add manual review. A person checks each document for MRNs. This does not scale.
Use AI-assisted custom entity creation. No code needed. The team gives sample values. The AI builds the pattern.
Here is how it works. A team gives five sample MRN values: SVHS-0012345, SVHS-0987654, SVHS-1122334, SVHS-4455667, SVHS-8899001. The AI returns SVHS-\d{7} and checks it against the samples. The team saves it to their HIPAA preset. All future sessions detect the format. The same approach works for beneficiary codes and device serial codes.
See how presets work in the HIPAA MRN detection guide. Learn about the AI pattern workflow.
The Hidden Assumption
Many teams test on a sample document with a name and a phone figure. The tool passes. They assume full coverage. But samples rarely include institution-specific identifiers. MRNs and beneficiary codes look like random strings to a generic tool. They pass with no flag.
A true Safe Harbor audit maps all 18 categories to a detection method. For category 8, verify with real MRN samples from your own hospital. Do not assume the tool knows your format.
Review the full framework in our HIPAA compliance overview.
Conclusion
Safe Harbor requires all 18 identifier categories gone. Generic tools cover far fewer. The gaps — MRNs, beneficiary codes, device serials — have no standard format, so generic tools miss them. AI-assisted custom entities close the gap without code or manual review.
Sources
- HHS: HIPAA Safe Harbor, 45 CFR § 164.514 — hhs.gov. VERIFIED.
- Shaip: PHI identifier types in healthcare de-identification — shaip.com. VERIFIED-EXTERNAL.
- HHS OCR: De-identification guidance updated 2024 — hhs.gov. VERIFIED.