By · Last updated 2026-05-29

Back to BlogGDPR & Compliance

Beyond SSNs: Internal ID Anonymization

Every organization has internal identifiers — employee IDs, account numbers, order IDs — that are personally identifiable in context but missed by.

May 29, 20267 minute read
custom PII detectionorganizational identifiersre-identification riskGDPR pseudonymizationcustom entity

Beyond SSNs: Anonymizing Your Organization's Internal IDs

Your GDPR tool removes email addresses. It removes phone numbers. It removes names. You run support exports through it. Then you share the output with your analytics team.

Your customer account numbers are still in every ticket. Your order IDs are still there. Your internal user IDs are still there too.

These IDs look harmless on their own. Without a lookup table, they do not name a person. But your analytics team has that table. Your CRM has it. Your support database has it. Anyone with access can find the person in seconds.

This is a GDPR failure. The tool did not break. It was never told to look for your IDs.

What Standard PII Tools Detect

Standard PII tools cover universal formats. They catch what every organization uses.

Standard tools detect:

  • Social security numbers (US SSNs, UK NINOs, EU national ID formats)
  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Names
  • Passport and driver's license numbers

Standard tools do not detect:

  • Employee IDs in your EMP-XXXXX format
  • Customer account numbers in your ACC-XXXXXXXX-XX format
  • Order IDs in your ORD-XXXXXXX format
  • Internal user IDs in UUID or custom formats
  • Partner-specific reference codes

Standard tools find universal patterns. Your internal IDs are not universal. They need custom setup to be found.

The Re-Identification Risk

A firm exports support tickets for quality review. Standard PII removal strips names, emails, and phone numbers. Account numbers in ACC-XXXXXXXX-XX format are not touched.

The export goes to the analytics team. An analyst joins the ticket table with the customer database on account number. The person is found at once. No special trick is needed. It is a routine SQL join.

GDPR Article 4(5) defines pseudonymization as processing where data "can no longer be attributed to a specific data subject without the use of additional information." Account numbers fail that test. The additional information — your customer database — is right there in your organization.

The "anonymized" export was not anonymous.

Building Custom Entity Patterns

Custom entity setup is fast. Compliance teams can do it with no engineering help.

Step 1: List your ID formats.

Write down each one. For example: account ACC-XXXXXXXX-XX, order ID ORD-XXXXXXX, employee ID EMP-XXXXX.

Step 2: Describe the format in plain language.

"Account numbers start with ACC, then a dash, then 8 digits, then a dash, then 2 uppercase letters."

AI-assisted pattern generation returns: ACC-\d{8}-[A-Z]{2}

Step 3: Test on sample data.

Upload 20 to 30 documents. Confirm all instances are found. Confirm no false hits appear.

Step 4: Choose a method.

For IDs used as join keys, where analysis needs to link records:

  • Pseudonymize. Replace ACC-00123456-AB with ACC-99876543-XY each time. The same input always gives the same output. Joins still work. The original value cannot be found without the key.

For IDs not needed in analysis:

  • Redact. Replace with [REDACTED]. Simple. Permanent.

Step 5: Save as a shared preset.

Save the custom entity — or a set of them — to a shared preset. The setup applies to all use: batch uploads, API calls, browser interface. New team members get the full config at once.

Case Study: 180,000 Support Tickets

A firm found 180,000 support tickets in their analytics warehouse. Names and emails had been removed. Account numbers had not. Each ticket still held a live ACC-XXXXXXXX-XX value.

Resolution timeline:

  1. Compliance officer defines the ACC pattern — 15 minutes
  2. Tests it on 30 sample tickets — 20 minutes
  3. Confirms accuracy — 10 minutes
  4. Processes 180,000 tickets in an overnight batch
  5. Replaces warehouse tables with the clean versions

Total time for the compliance officer: 45 minutes. Without custom entity support, the fix would need an engineering ticket, code review, and a deploy. That takes weeks, not hours.

For a closer look at how custom IDs create risk in AI support tools, see the GDPR and support AI guide.

Where Custom IDs Spread

Internal IDs appear in more places than most teams expect.

Internal documents:

  • Meeting notes with account or order ID references
  • Email threads about customer cases
  • Presentations with case study data

Shared with third parties:

  • Reports to regulators with case reference numbers
  • Audit files with customer references
  • Vendor files that carry customer IDs

Research and analytics:

  • Customer journey datasets
  • Support quality review exports
  • Training data for internal ML models

Each context needs the same custom entity setup to produce truly anonymous output.

Pseudonymization vs. Anonymization

GDPR draws a clear line.

Pseudonymization replaces IDs with stand-ins. The original person can be found again if someone has the lookup table. This data is still personal data. It reduces risk. It does not remove your GDPR duties.

Anonymization removes the ability to re-identify. Anonymous data is not personal data. GDPR does not apply to it.

Account numbers and order IDs are pseudonymous when lookup tables exist. Replacing them with fixed stand-ins lowers risk, but GDPR still applies. Replacing them with random tokens — and deleting the key — removes the GDPR duty, but breaks join-based analysis.

For sharing with third parties who lack your lookup tables: pseudonymization may be enough. For internal analytics, full anonymization or strict access controls are needed. The legal compliance guide covers how to document each approach for your ROPA.

Conclusion

The gap is not a tool failure. It is a setup gap. No tool can know your account number format unless you tell it.

Custom entity setup closes the gap in hours. Compliance teams define the formats, test them on sample data, and apply them across all use modes. No engineering help is needed.

The 180,000 unredacted account numbers were not there because the tool failed. They were there because the tool was never told to look for them.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.