Global PII: SSN, CPF, Aadhaar & More
The US-Centric PII Tool Problem
Most PII tools were built in the United States. They target US data formats. The Social Security Number has nine digits in AAA-BB-CCCC format. Its area, group, and serial segments follow documented rules. US-focused tools catch it well. They also detect US phone numbers, email addresses, and driver's licenses. They miss every national ID used outside the US.
GDPR does not allow a US-only exemption. Take the German Steuer-ID. It is an 11-digit tax ID. The Bundeszentralamt für Steuern issues it. Its final digit is a checksum. It identifies a German resident just as an SSN identifies an American. GDPR Article 4 covers "any information relating to an identified or identifiable natural person." A Steuer-ID fits that definition. It is personal data. That is true whether or not your tool knows the format.
GDPR fines have followed EU-specific PII exposure in systems using US-only tools. The compliance gap is real. Enforcement actions have resulted. See our GDPR compliance guide for context.
The European Identifier Landscape
The coverage gap is large. Here is a country-by-country breakdown.
Germany: Steuer-ID — 11 digits, checksum-validated. Sozialversicherungsnummer — 12 fields, structured. Reisepass — 10 characters with authority codes.
France: NIR is the national social security ID. It has 15 digits. They encode gender, birth year, birth month, department, commune, and a check key. SIRET has 14 digits. SIREN has nine.
Sweden: Personnummer uses format YYMMDD-XXXX. Samordningsnummer covers non-residents. The day value is offset by 60.
Norway: Fødselsnummer has 11 values in format DDMMYYNNNKK. Gender is encoded in the middle group. D-nummer offsets the day value by 40.
Brazil: CPF — Cadastro de Pessoas Físicas — has 11 digits with two check values. CNPJ is the 14-number business ID.
India: Aadhaar is a 12-digit biometric ID. It uses a Verhoeff check. PAN is a 10-digit tax ID with letters and numbers.
UAE: Emirates ID has 15 numbers in the format 784-birth year-sequence-check.
A global HR team covering 12 countries needs one tool. It must handle all 12 national ID formats in a single pass. Maintaining separate regex libraries per country is not workable.
The 285+ Entity Type Architecture
The 285+ entity type library covers all EU member state formats. It also covers major APAC IDs. Those include Aadhaar, PAN, CPF, CNPJ, Emirates ID, and Thai citizen ID. US formats — SSN, EIN, state driver's licenses — are included too. One engine handles them all. The library updates as formats change.
This is the gap most tools leave open. See the entities reference to review what is covered. For API pricing by volume, visit pricing.