LGPD Brazil: CPF, CNPJ, and Data Protection
Brazil's Lei Geral de Proteção de Dados (LGPD) covers 215 million people. It is the world's third-largest data protection law by population. It covers more people than Germany, France, and the UK combined. The Autoridade Nacional de Proteção de Dados (ANPD) issued its first major fines in 2024. The grace period after LGPD's 2020 enactment is over.
There is also a technical challenge. LGPD documents are in Brazilian Portuguese. National IDs in Brazil differ from those in Portugal. They also differ from any other country's IDs.
Why Brazilian PII Is Different
Brazil's federal and state ID systems grew apart from European digital identity systems. This created a unique set of identifiers. Most NLP tools are trained on English or European data. They fail to detect local IDs.
CPF (Cadastro de Pessoas Físicas): The 11-digit taxpayer number. Format: XXX.XXX.XXX-XX. It has two check digits. The formula uses two separate math steps. Both must match for the CPF to be valid.
The detection gap is large. English-trained NLP tools detect CPF with only 45% accuracy (ANPD, 2024). Two reasons explain this. First, tools that match 11-digit numbers without the two-step check digit logic confuse valid CPF numbers with random sequences. Second, CPF sometimes lacks the XXX.XXX.XXX-XX format. This occurs in OCR output and plain-text forms.
CNPJ (Cadastro Nacional da Pessoa Jurídica): The 14-digit company ID number. Format: XX.XXX.XXX/XXXX-XX. It also has two check digits. The formula is like CPF but not the same.
RG (Registro Geral): The state civil ID card. The format varies by state. São Paulo uses 2 letters and 5–9 digits. Rio de Janeiro uses 7–8 digits with a dash. Minas Gerais uses 7–9 digits. Other states have their own formats. A tool that knows only one state's RG will miss most RG numbers.
CNH (Carteira Nacional de Habilitação): The 11-digit driver's license number. It has one check digit. The format includes a district code.
Título de Eleitor: The 12-digit voter ID number. It has three parts: an 8-digit ID code, a 2-digit state code, and 2 check digits.
SUS number (Cartão SUS): The 15-digit public health ID. Every person in the country gets one. It appears in all hospital and clinic records.
PIS/PASEP: The 11-digit social program number. It appears in every employment record.
LGPD Anonymization Standard
LGPD Article 12 defines anonymous data. The standard: data "cannot be identified, considering reasonable technical means at the time of processing." This is a technology-relative standard. Today's anonymous data may not stay that way as re-ID methods improve.
ANPD adds more guidance. Removing direct identifiers like CPF and name is not enough. Groups of quasi-identifiers can still allow re-ID. Age range, city, gender, and job together may identify a person. These must be handled by grouping or noise addition.
For AI training data, ANPD requires one of three conditions. First: data meets the Article 12 standard. Second: each data subject gave explicit consent for the specific training use. Third: there is a valid documented purpose.
Portuguese Language Requirements
Brazilian Portuguese differs from European Portuguese. The words, spelling, and document forms are not the same. NLP models trained on Portugal text reach about 71% of the accuracy of models trained on local text. This comes from the ANPD technical assessment.
Key differences for PII detection:
- Names: Double-surname use and name order differ from Portugal.
- Addresses: CEP codes use the format XXXXX-XXX. This format is unique to the country. It needs its own detection logic.
- Document terms: "Carteira de Identidade" here vs. "Bilhete de Identidade" in Portugal. Agency names also differ.
What ANPD Compliance Needs
Four technical needs cover ANPD compliance. CPF and CNPJ detection must include two-step check digit validation. RG detection must cover all states. SUS number and Título de Eleitor detection are also required. NLP models must be trained on local Portuguese text.
See our guide to global PII identifier detection and LGPD enforcement actions in 2024.