By · Last updated 2026-05-29

Back to BlogGDPR & Compliance

Japan My Number: Verhoeff & APPI

63% of generic tools fail My Number detection in Japanese documents. My Number uses Verhoeff algorithm — the most complex national ID checksum in Asia.

May 29, 20268 minute read
Japan PPCMy Number VerhoeffJapanese language NERAPPI complianceJapanese PII

Japan My Number: APPI and the Verhoeff Check

Japan's Personal Information Protection Commission (PPC) issued 45 enforcement decisions in 2024. It also published Japan's first AI privacy guidance. A PPC study found that 63% of generic NLP tools fail to detect My Number (マイナンバー) in Japanese files. If your team handles data of Japanese residents, that gap means direct APPI risk.

What My Number Is

Japan gives every resident a unique 12-digit identifier. This is My Number, part of the Individual Number System (マイナンバー制度). It covers tax, pension, health insurance, and disaster response. This identifier is sensitive data under APPI. You need a legal reason to collect or share it.

The Verhoeff Check Problem

My Number uses the Verhoeff algorithm for its check digit. Verhoeff is a math method that catches all single-digit errors. It also catches all errors where two adjacent digits swap. It needs three lookup tables to work. You cannot compute it by hand. It requires code.

This matters for two reasons. First, Japan's 12-digit format looks like many other codes. Invoice references, document IDs, and date strings all share the same format. Without a Verhoeff check, a tool will flag the wrong values. Second, most tools do not use Verhoeff. They use simpler modulo-10 or modulo-11 checks. Those do not work here.

The PPC study found that 63% of tools either skip the check or use a simpler method. Both problems occur at once: false positives and false negatives.

The Luhn algorithm, used for credit cards, is simpler. My Number does not use Luhn. Tools built for Luhn will not work.

Three Scripts, One Name

Japanese text uses three writing systems at once. A tool must handle all three.

Hiragana (ひらがな): Used for grammar and native words. 46 base characters.

Katakana (カタカナ): Used for foreign words and names. 46 base characters. Foreign names in Japan appear in this script.

Kanji (漢字): Symbols for nouns and names. About 2,000 are in common use.

One person's name can appear in four forms: Kanji (田中太郎), Hiragana (たなかたろう), Katakana (タナカ タロウ), and Romaji (Tanaka Taro). A tool must match all four. If it misses one, it misses most of that person's records.

Other Japanese IDs to Detect

Driver's license (運転免許証番号): 12 digits. The first two digits show the prefecture. Tokyo is 10. Osaka is 62. This lets a tool check whether the value is valid for that region.

Passport (旅券番号): Two letters plus seven digits. ICAO format. Japan uses specific letter pairs.

Health insurance card (健康保険証記号番号): A symbol plus a number. The format depends on the insurer. National Health Insurance (国民健康保険) and Society-Managed Insurance (協会けんぽ) use different formats.

Residence card (在留カード番号): For foreign residents. Two letters, eight digits, two letters. The Ministry of Justice issues this card.

APPI's Anonymization Rule

APPI has a strict anonymized data standard called anonymized information (匿名加工情報). It goes further than GDPR in one key area. Anonymization must be third-party verifiable and technically irreversible.

To comply, an organization must:

  1. Remove all direct identifiers, including My Number.
  2. Handle all quasi-identifier combinations.
  3. Use k-anonymity or a similar method.
  4. Publish a general description of the steps taken.
  5. Never try to re-identify the data.

The PPC's 2024 AI guidance adds a specific rule. If you train an AI on anonymized data, you cannot use that model to re-identify people. This is a direct ban on model inversion attacks against APPI training sets.

To meet PPC standards, you need four things. First, Verhoeff validation for My Number detection. Second, Japanese NER using ja_core_news with proper tokenization. Third, name matching across Kanji, Kana, and Romaji. Fourth, prefecture code checks for driver's licenses.

India uses Aadhaar, which also requires Verhoeff validation. The India DPDPA technical compliance guide covers that in detail. For multi-country identifier detection, see EU national tax ID detection under GDPR.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.