Global PII Compliance: Three Laws, Three ID Formats

A UK marketplace handles seller documents from 80 countries. Three laws apply at once: GDPR for EU sellers, LGPD for Brazilian sellers, and India's DPDP Act for Indian sellers. Each law names different national IDs as protected. Each format has its own check logic.

Brazilian CPF: Format and LGPD Status

The CPF (Cadastro de Pessoas Físicas) is Brazil's taxpayer number. It has 11 digits in the format XXX.XXX.XXX-XX. The last two digits are check digits. A math algorithm on the first nine digits produces them.

Brazil's LGPD treats CPF as a protected personal identifier, similar in sensitivity to a US SSN. A tool that does not know the CPF format cannot find it. One that skips the checksum will flag false matches.

Indian Aadhaar: Format and DPDP Rules

Aadhaar is a 12-digit number issued by India's UIDAI. Numbers are assigned at random. The last digit is a Verhoeff check digit.

India's DPDP Act creates duties for any group handling Aadhaar-linked data. Detection needs two steps. First, match the 12-digit format and check the Verhoeff digit. Second, filter by context. Not every 12-digit string is an Aadhaar.

US SSN: A Known Structure

The SSN is nine digits. The first three are the area number. The next two are the group number. The last four are the serial number. Each segment has set rules. Validation is well documented.

The Gap Between Single-Country Tools and Global Rules

These three IDs share no format and no check rule. A tool built for US use will catch SSNs. It may miss CPF and Aadhaar entirely.

Most teams find this gap when a regulator asks — not before. The gap creates real risk under each law:

GDPR Article 28 requires a written Data Processing Agreement with each processor. A DPIA that lists "SSN detection" as the main control — when the dataset also holds CPF numbers — has a documented gap. An auditor can find it.
LGPD fines can reach 2% of Brazilian revenue, up to R$50M per breach. A CPF that goes undetected is a direct LGPD violation.
DPDP enforcement is still new. Teams that log their coverage now will be better placed when early rulings set the standard.

Three fine regimes at once create layered risk. Single-country tools leave global teams exposed.

What Full Coverage Requires

A tool needs each ID's format, check algorithm, and legal context. CPF needs a modular checksum. Aadhaar needs the Verhoeff check plus context filtering. SSN needs area and group rules. These are three separate problems. No single search pattern covers all of them.

When This Approach Has Limits

Treating CPF, Aadhaar, and SSN as three separate detection problems with their own formats and checksums is the correct framing, and a tool that does this will catch far more than a single-country one. But limits remain worth stating plainly.

Checksums confirm shape, not context. The CPF modular check and the Aadhaar Verhoeff digit reduce false matches, but a valid-looking 12-digit number is not always an Aadhaar, and a string that fails a checksum may still be a real identifier typed with an error. Detection accuracy has a residual false-negative rate: malformed entries, OCR noise from scanned seller documents, and numbers split across fields can defeat both the pattern and the check. Validate against your own messy source files rather than assuming the algorithm catches every instance.

The list of laws and IDs keeps growing. GDPR, LGPD, and DPDP are three regimes, but a marketplace handling 80 countries will encounter national IDs none of the three name explicitly, plus passport numbers, tax IDs, and health identifiers with their own formats. Coverage that is complete today goes stale as you enter new markets or as DPDP enforcement matures and sets fresh expectations. Re-confirm your detected entity set against the jurisdictions you actually serve, and log the gaps you have chosen to accept.

Detection supports compliance; it does not establish it. Finding a CPF is a technical step. Whether your processing satisfies GDPR Article 28, LGPD, or the DPDP Act is a legal judgment that also turns on your DPA, your lawful basis, and human review of each output. A clean detection log helps an auditor but does not replace the expert assessment a regulator expects. Keep a person accountable for confirming that the controls map to each law before you rely on them.

Sources

Ready to protect your data?

Start anonymizing PII with 267+ entity types across 48 languages.

Start Free Trial View Features

Global PII Compliance: GDPR, LGPD, and DPDP

Global PII Compliance: Three Laws, Three ID Formats

Brazilian CPF: Format and LGPD Status

Indian Aadhaar: Format and DPDP Rules

US SSN: A Known Structure

The Gap Between Single-Country Tools and Global Rules

What Full Coverage Requires

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

Global PII Compliance: GDPR, LGPD, and DPDP

Global PII Compliance: Three Laws, Three ID Formats

Brazilian CPF: Format and LGPD Status

Indian Aadhaar: Format and DPDP Rules

US SSN: A Known Structure

The Gap Between Single-Country Tools and Global Rules

What Full Coverage Requires

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow