Updated for 2026

The Enforcement Reality

GDPR is about results, not effort. A company can use a PII detection tool in good faith. But if that tool misses French, German, or Polish IDs, the company has still failed Article 32. The rule requires "appropriate technical measures." A tool that cannot find the IDs in your records fails it. Good intentions do not change that.

The "we used a tool" defense does not hold. Supervisory bodies look at the specific tools used. When an English-only tool processed multilingual records, Article 32 becomes the key question.

This is a real enforcement pattern. It has been seen in GDPR cases across the EU.

What Supervisory Authorities Find

GDPR data from 2024 shows Article 32 violations rank among the top grounds for fines. Companies cite automated anonymization tools as proof of technical measures. Supervisory bodies then check whether those tools work.

For global employers, the risk is systemic. Take an HR platform. It strips personal data before analytics. It may remove English email addresses and phone numbers. But it leaves French NIR numbers, German Steuer-IDs, and Polish PESEL numbers intact. Swedish personnummers stay too.

The company thinks the records are clean. The supervisory body finds 40% of the IDs in the "anonymized" dataset are still there. They are national IDs the tool never covered.

Identifier Formats English-Only Tools Miss

EU national IDs differ from US and generic formats. English-only tools fail to detect them:

German Steuer-Identifikationsnummer: 11-digit format with a checksum. Tools built for US SSN (9-digit) patterns do not catch it.

French NIR (numéro de sécurité sociale): 15-digit format. It encodes sex, birth year, and department. Generic ID patterns do not match it.

Swedish Personnummer: 10 or 12 digits with a Luhn check digit. The format shifts for people born before 1990. Generic patterns lack this.

Polish PESEL: 11 digits with birth date and gender encoded. Without checksum checks, false positive rates get too high.

These are common identifiers. Any EU employer, healthcare provider, or financial firm handling German, French, Swedish, or Polish records will see them. They are not rare. See our entities reference for a full list of supported ID types.

GDPR Article 32 calls for "appropriate technical and organizational measures." The bar is on results. Did the organization use a tool? That is not the right question. Did the tool protect the personal records it processed? That is the right question.

For organizations with multilingual EU records, "appropriate" means detecting German Steuer-IDs in the same pass as English email addresses. An organization that catches 95% of English content but 0% of German national IDs has not met the bar. The gap fails its German records.

Multilingual coverage is not optional. It is part of what Article 32 requires. Full stop. Our GDPR compliance guide covers the full framework.

How to Evaluate Your Tool

The right question for your tool is simple. Can it find email addresses in any language? That matters less. Can it find the national ID formats in your actual records? That is the real test.

For EU operations serving Germany, France, Poland, or Sweden, this means locale-specific recognizer coverage. If your tool cannot show solid detection rates for those formats, treat the gap as a live compliance risk. Our security and compliance page explains how we handle multilingual coverage.

anonym.legal detects German Steuer-ID, French NIR, Swedish Personnummer, Polish PESEL, and national IDs for all EU states. Each recognizer uses checksum-aware validation for accurate results.

When This Approach Has Limits

Switching from an English-only tool to native multilingual coverage removes a clear liability, but it does not remove liability altogether. Three points keep the claim honest.

Closing the language gap does not make detection complete. Checksum-aware recognizers for national IDs are highly reliable on those specific formats, but they say nothing about the personal data that has no fixed format: names, free-text addresses, health details, and contextual identifiers in each language. A breach can still occur through undetected free-text PII even when every national-ID format is covered. Multilingual coverage shrinks the exposure; it does not certify that nothing was missed.

Detection failures are a controller's risk, not the tool's. Under GDPR, the controller is accountable for the breach regardless of which tool failed. A vendor's strong detection rates are evidence you exercised diligence, but they do not transfer liability. You remain responsible for validating the tool against your own data and for the human review step that catches what automation misses — the article's test sets are a starting point, not a defense on their own.

Per-format coverage and language accuracy both vary. "All EU states" means the formats the recognizers were built and tested for, and accuracy differs across languages with different scripts, orthography, and training data. A tool may handle German and French well while performing worse on a low-resource language or a transliterated document. Verify the specific formats and languages in your operations rather than relying on a coverage headline.

Sources

Ready to protect your data?

Start anonymizing PII with 267+ entity types across 48 languages.

Start Free Trial View Features

English-Only PII Tools: A GDPR Liability

The Enforcement Reality

What Supervisory Authorities Find

Identifier Formats English-Only Tools Miss

How to Evaluate Your Tool

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

English-Only PII Tools: A GDPR Liability