Japan My Number: APPI and the Verhoeff Check

Japan's Personal Information Protection Commission (PPC) issued 45 enforcement decisions in 2024. It also published Japan's first AI privacy guidance. A PPC study found that 63% of generic NLP tools fail to detect My Number (マイナンバー) in Japanese files. If your team handles data of Japanese residents, that gap means direct APPI risk.

What My Number Is

Japan gives every resident a unique 12-digit identifier. This is My Number, part of the Individual Number System (マイナンバー制度). It covers tax, pension, health insurance, and disaster response. This identifier is sensitive data under APPI. You need a legal reason to collect or share it.

The Verhoeff Check Problem

My Number uses the Verhoeff algorithm for its check digit. Verhoeff is a math method that catches all single-digit errors. It also catches all errors where two adjacent digits swap. It needs three lookup tables to work. You cannot compute it by hand. It requires code.

This matters for two reasons. First, Japan's 12-digit format looks like many other codes. Invoice references, document IDs, and date strings all share the same format. Without a Verhoeff check, a tool will flag the wrong values. Second, most tools do not use Verhoeff. They use simpler modulo-10 or modulo-11 checks. Those do not work here.

The PPC study found that 63% of tools either skip the check or use a simpler method. Both problems occur at once: false positives and false negatives.

The Luhn algorithm, used for credit cards, is simpler. My Number does not use Luhn. Tools built for Luhn will not work.

Three Scripts, One Name

Japanese text uses three writing systems at once. A tool must handle all three.

Hiragana (ひらがな): Used for grammar and native words. 46 base characters.

Katakana (カタカナ): Used for foreign words and names. 46 base characters. Foreign names in Japan appear in this script.

Kanji (漢字): Symbols for nouns and names. About 2,000 are in common use.

One person's name can appear in four forms: Kanji (田中太郎), Hiragana (たなかたろう), Katakana (タナカタロウ), and Romaji (Tanaka Taro). A tool must match all four. If it misses one, it misses most of that person's records.

Other Japanese IDs to Detect

Driver's license (運転免許証番号): 12 digits. The first two digits show the prefecture. Tokyo is 10. Osaka is 62. This lets a tool check whether the value is valid for that region.

Passport (旅券番号): Two letters plus seven digits. ICAO format. Japan uses specific letter pairs.

Health insurance card (健康保険証記号番号): A symbol plus a number. The format depends on the insurer. National Health Insurance (国民健康保険) and Society-Managed Insurance (協会けんぽ) use different formats.

Residence card (在留カード番号): For foreign residents. Two letters, eight digits, two letters. The Ministry of Justice issues this card.

APPI's Anonymization Rule

APPI has a strict anonymized data standard called anonymized information (匿名加工情報). It goes further than GDPR in one key area. Anonymization must be third-party verifiable and technically irreversible.

To comply, an organization must:

Remove all direct identifiers, including My Number.
Handle all quasi-identifier combinations.
Use k-anonymity or a similar method.
Publish a general description of the steps taken.
Never try to re-identify the data.

The PPC's 2024 AI guidance adds a specific rule. If you train an AI on anonymized data, you cannot use that model to re-identify people. This is a direct ban on model inversion attacks against APPI training sets.

To meet PPC standards, you need four things. First, Verhoeff validation for My Number detection. Second, Japanese NER using ja_core_news with proper tokenization. Third, name matching across Kanji, Kana, and Romaji. Fourth, prefecture code checks for driver's licenses.

India uses Aadhaar, which also requires Verhoeff validation. The India DPDPA technical compliance guide covers that in detail. For multi-country identifier detection, see EU national tax ID detection under GDPR.

When This Approach Has Limits

Verhoeff validation for My Number plus name matching across Kanji, Kana, and Romaji is the correct baseline — that part of the approach is sound. But limits remain worth stating plainly.

The My Number format and Japanese text need configuration and held-out testing. Verhoeff needs three lookup tables and cannot be computed by hand, so a tool built for Luhn or simple modulo checks will not validate My Number at all. The 12-digit form also mimics invoice references and document IDs. One person's name appears as Kanji, Hiragana, Katakana, and Romaji, and missing any one form means missing most of that person's records. These behaviors have to be configured deliberately and confirmed on held-out Japanese documents, because a PPC study found 63 percent of generic tools fail here by default.

Detection accuracy bounds the result. A My Number is only protected once it has been recognized as one, and the residual false-negative rate sets the ceiling on coverage. Driver's licenses, residence cards, and insurer-specific health card formats each follow their own structure, and a tokenizer that mishandles Japanese script segmentation drops names the checksum stage never sees. Measure recall on your own files, since multilingual accuracy varies and performance on clean text does not predict performance on mixed-script real documents.

The tool supports compliance but does not constitute it. APPI's anonymized-information standard demands third-party-verifiable, technically irreversible processing — handling every quasi-identifier combination, applying k-anonymity, publishing the method, and never re-identifying. The PPC's 2024 AI guidance adds a ban on using a trained model to invert anonymized data. Removing direct identifiers leaves quasi-identifiers that can re-identify, so the output may be pseudonymized rather than anonymized. A detector cannot prove irreversibility or replace human review; the controller owns the full posture the PPC audits.

Sources

Ready to protect your data?

Start anonymizing PII with 267+ entity types across 48 languages.

Start Free Trial View Features

Japan My Number: Verhoeff & APPI

Japan My Number: APPI and the Verhoeff Check

What My Number Is

The Verhoeff Check Problem

Three Scripts, One Name

Other Japanese IDs to Detect

APPI's Anonymization Rule

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

Japan My Number: Verhoeff & APPI

Japan My Number: APPI and the Verhoeff Check

What My Number Is

The Verhoeff Check Problem

Three Scripts, One Name

Other Japanese IDs to Detect

APPI's Anonymization Rule

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow