Beyond SSNs: Anonymizing Your Organization's Internal IDs
Your GDPR tool removes email addresses. It removes phone numbers. It removes names. You run support exports through it. Then you share the output with your analytics team.
Your customer account numbers are still in every ticket. Your order IDs are still there. Your internal user IDs are still there too.
These IDs look harmless on their own. Without a lookup table, they do not name a person. But your analytics team has that table. Your CRM has it. Your support database has it. Anyone with access can find the person in seconds.
This is a GDPR failure. The tool did not break. It was never told to look for your IDs.
What Standard PII Tools Detect
Standard PII tools cover universal formats. They catch what every organization uses.
Standard tools detect:
- Social security numbers (US SSNs, UK NINOs, EU national ID formats)
- Email addresses
- Phone numbers
- Credit card numbers
- Names
- Passport and driver's license numbers
Standard tools do not detect:
- Employee IDs in your EMP-XXXXX format
- Customer account numbers in your ACC-XXXXXXXX-XX format
- Order IDs in your ORD-XXXXXXX format
- Internal user IDs in UUID or custom formats
- Partner-specific reference codes
Standard tools find universal patterns. Your internal IDs are not universal. They need custom setup to be found.
The Re-Identification Risk
A firm exports support tickets for quality review. Standard PII removal strips names, emails, and phone numbers. Account numbers in ACC-XXXXXXXX-XX format are not touched.
The export goes to the analytics team. An analyst joins the ticket table with the customer database on account number. The person is found at once. No special trick is needed. It is a routine SQL join.
GDPR Article 4(5) defines pseudonymization as processing where data "can no longer be attributed to a specific data subject without the use of additional information." Account numbers fail that test. The additional information — your customer database — is right there in your organization.
The "anonymized" export was not anonymous.
Building Custom Entity Patterns
Custom entity setup is fast. Compliance teams can do it with no engineering help.
Step 1: List your ID formats.
Write down each one. For example: account ACC-XXXXXXXX-XX, order ID ORD-XXXXXXX, employee ID EMP-XXXXX.
Step 2: Describe the format in plain language.
"Account numbers start with ACC, then a dash, then 8 digits, then a dash, then 2 uppercase letters."
AI-assisted pattern generation returns: ACC-\d{8}-[A-Z]{2}
Step 3: Test on sample data.
Upload 20 to 30 documents. Confirm all instances are found. Confirm no false hits appear.
Step 4: Choose a method.
For IDs used as join keys, where analysis needs to link records:
- Pseudonymize. Replace ACC-00123456-AB with ACC-99876543-XY each time. The same input always gives the same output. Joins still work. The original value cannot be found without the key.
For IDs not needed in analysis:
- Redact. Replace with [REDACTED]. Simple. Permanent.
Step 5: Save as a shared preset.
Save the custom entity — or a set of them — to a shared preset. The setup applies to all use: batch uploads, API calls, browser interface. New team members get the full config at once.
Case Study: 180,000 Support Tickets
A firm found 180,000 support tickets in their analytics warehouse. Names and emails had been removed. Account numbers had not. Each ticket still held a live ACC-XXXXXXXX-XX value.
Resolution timeline:
- Compliance officer defines the ACC pattern — 15 minutes
- Tests it on 30 sample tickets — 20 minutes
- Confirms accuracy — 10 minutes
- Processes 180,000 tickets in an overnight batch
- Replaces warehouse tables with the clean versions
Total time for the compliance officer: 45 minutes. Without custom entity support, the fix would need an engineering ticket, code review, and a deploy. That takes weeks, not hours.
For a closer look at how custom IDs create risk in AI support tools, see the GDPR and support AI guide.
Where Custom IDs Spread
Internal IDs appear in more places than most teams expect.
Internal documents:
- Meeting notes with account or order ID references
- Email threads about customer cases
- Presentations with case study data
Shared with third parties:
- Reports to regulators with case reference numbers
- Audit files with customer references
- Vendor files that carry customer IDs
Research and analytics:
- Customer journey datasets
- Support quality review exports
- Training data for internal ML models
Each context needs the same custom entity setup to produce truly anonymous output.
Pseudonymization vs. Anonymization
GDPR draws a clear line.
Pseudonymization replaces IDs with stand-ins. The original person can be found again if someone has the lookup table. This data is still personal data. It reduces risk. It does not remove your GDPR duties.
Anonymization removes the ability to re-identify. Anonymous data is not personal data. GDPR does not apply to it.
Account numbers and order IDs are pseudonymous when lookup tables exist. Replacing them with fixed stand-ins lowers risk, but GDPR still applies. Replacing them with random tokens — and deleting the key — removes the GDPR duty, but breaks join-based analysis.
For sharing with third parties who lack your lookup tables: pseudonymization may be enough. For internal analytics, full anonymization or strict access controls are needed. The legal compliance guide covers how to document each approach for your ROPA.
Conclusion
The gap is not a tool failure. It is a setup gap. No tool can know your account number format unless you tell it.
Custom entity setup closes the gap in hours. Compliance teams define the formats, test them on sample data, and apply them across all use modes. No engineering help is needed.
The 180,000 unredacted account numbers were not there because the tool failed. They were there because the tool was never told to look for them.