Real-Time PII Prevention: Stopping AI Data Leaks Before They Happen.

Updated for 2026.

In March 2023, a Samsung engineer pasted source code into ChatGPT. The code left Samsung's control at once. No tool caught it in time. Post-hoc security controls cannot stop AI data leaks. This one event proved it.

Detection tools tell you what happened after the fact. Log checks, endpoint DLP, and audit logs all work this way. For AI leaks, after the fact is too late. The data has already reached the AI model.

The Scale of the Problem

A 2025 Cyberhaven study looked at how firms use AI. The findings were striking.

11% of all ChatGPT prompts contain private or sensitive data.
The average worker uses AI tools 14 times per day.
High-use staff interact 30 to 50 times daily.
At 11%, that means 3 to 5 sensitive sends per worker per day.

At a firm with 500 high-use workers, this adds up to 2,000-plus sensitive sends per day. Each one can be a GDPR Article 83 breach. The risk is not just legal. Trust and reputation are also at stake.

Common types of sensitive content in AI prompts include the following.

Customer names and contact details.
Account numbers and payment records.
Medical notes from health workers.
Case details from lawyers.
Staff review notes from HR teams.
Internal revenue or sales projections.

The study does not split intentional from accidental sharing. Both create the same legal risk. A worker who forgets to remove a client name causes the same breach as one who ignores the rule. Intent does not change the outcome.

Why Detection Falls Short

Network checks cannot read HTTPS traffic without TLS blocking. TLS blocking adds overhead and raises privacy concerns. Modern browsers often reject it.

Endpoint DLP agents watch clipboard and keystroke input. But they have lag. By the time an agent flags a pattern, the prompt may be sent already.

Vendor audit logs record what was shared after it was shared. They help with response. They do not stop leaks.

Staff training is a policy, not a control. The Cyberhaven study shows 11% of prompts still contain sensitive content at firms with clear policies. Training does not stop accidental sharing or mid-task lapses.

Blocking AI tools removes output gains. Workers then use personal devices or accounts. That places work outside any oversight.

None of these methods stop sensitive content from reaching AI systems in real time.

Prevention at the Point of Entry

The only safe defense is masking before the prompt is sent. A customer name replaced with [PERSON_1] before it leaves the browser is never seen by the AI model.

Here is how inline masking works.

A worker types a customer email into Claude or ChatGPT.
The browser add-on detects personal data in real time.
Entities are marked with type labels: PERSON, EMAIL_ADDRESS, ACCOUNT_NUMBER.
The worker reviews the marked items.
One click swaps all entities for tokens.
The masked prompt is sent.

The AI gets a prompt like this: "Customer [PERSON_1] at [EMAIL_1] has account [ACCOUNT_1]."

The AI handles the request. It never sees real names or numbers. The worker knows the actual customer from context.

This approach has clear benefits.

Personal data stays out of external AI systems.
Customer details are not added to AI training sets.
Workers keep access to AI tools. Output stays high.

It does not stop deliberate sharing if a worker bypasses the tool. File uploads need a separate workflow. No control is perfect. But inline masking removes the accidental group. That group makes up most incidents. The result is a large drop in risk with no change to the daily workflow.

Law Firm Case Study

A law firm's staff used Claude to draft contract notes. Their method: copy contract sections, paste into Claude, request a summary.

Before Chrome Extension use — first 6 months:

3 client data incidents found during the review.
Each incident: a client name plus a matter reference number appeared in the prompt.
All 3 were accidental.

After Chrome Extension use — next 6 months:

Zero client data incidents.
Staff received real-time alerts when pasting sections with client names.
One click replaced "Johnson Controls Matter 2024-0347" with "[PERSON_1] Matter [REFERENCE_1]."
The method stayed the same.

The managing partner said: "Our staff knew the policy before the add-on. The add-on made compliance the easy path."

See how other firms handled this in our case studies. Review controls in the security overview.

Firms using browser-based AI masking must document it as a technical control.

Records of Processing (ROPA): State that AI prompts pass through client-side masking before reaching vendors. List the entity types, the engine version, and deploy logs as evidence.

Data processor deals: When no personal data reaches the AI vendor, DPA duties are simple. The personal data you hold never leaves your system.

Audit logs: Add-on logs capture entity count per session, mask rate, and entity types by volume. These metrics feed into compliance reports.

Review GDPR rules for AI tools in our legal compliance guide and glossary. Common questions are in our FAQ.

Conclusion

The Samsung incident showed that AI leaks happen faster than any post-hoc control can act. The Cyberhaven study put a number on it: 11% of prompts, many times per worker, every day.

Real-time masking before sending fixes the root cause. When personal data never reaches the AI, there is nothing to detect, log, or clean up. Workers keep their AI tools. Firms keep their compliance status.

Detection tells you when prevention failed. For AI data leaks, the cost of failure — fines, harm to reputation, loss of trust — justifies prevention first.

Explore pricing for your firm. Read our founder statement on why prevention-first is our core design principle.

When This Approach Has Limits

Masking before the prompt leaves the browser is the right control for accidental AI leaks — preventing data from reaching the model beats detecting the leak afterward — but the article already concedes some limits, and they are worth stating plainly.

Detection accuracy bounds what gets masked. Inline masking only replaces the entities the engine recognizes in the prompt the worker is typing. A customer name in an unusual format, a matter reference the recognizers do not cover, or PII inside a pasted block the engine parses poorly will pass through unmasked. The law firm's move from three incidents to zero is real, but it measures the accidental cases the tool caught, not the residual rate of what it missed. Configure and test the entity set against your real prompts, and keep the human review step the workflow already includes.

It stops accidental sharing, not a determined user. The article is honest that masking does not prevent a worker who bypasses the tool, switches to a personal device, or uploads a file through a separate path. Client-side masking removes the largest group of incidents — the accidental ones — but it is one control, not a complete boundary. File uploads, screenshots, and copy-out to unmanaged apps remain open channels. Treat inline masking as the layer that handles the common case, with policy and other controls covering deliberate exfiltration.

A technical control supports compliance but does not constitute it. Generating ROPA entries, masking metrics, and audit logs gives compliance teams strong evidence of a measure, but whether that measure is "appropriate" under GDPR for your processing is a judgment for a DPO, not an output of the extension. The 11 percent prompt-exposure figure comes from one Cyberhaven study and a particular usage mix; your exposure and your residual unmasked rate will differ. Use the logs as evidence of a process, and have a qualified person confirm the process meets the standard.

Sources

Cyberhaven: AI Data Exposure Study 2025 — cyberhaven.com.
Samsung ChatGPT Data Breach, March 2023 — Bloomberg.
GDPR Articles 4 and 32: Personal data and technical measures — gdpr-info.eu.

Ready to protect your data?

Start anonymizing PII with 267+ entity types across 48 languages.

Start Free Trial View Features

Real-Time PII Prevention for AI Data Leaks

Real-Time PII Prevention: Stopping AI Data Leaks Before They Happen.

The Scale of the Problem

Why Detection Falls Short

Prevention at the Point of Entry

Law Firm Case Study

Conclusion

When This Approach Has Limits

Sources

Related Articles

Real-Time PII Prevention Saves $2.2M

GDPR Art. 32: AI Tools PII Monitoring

GDPR Support AI: Custom Identifiers

Ready to protect your data?

Real-Time PII Prevention for AI Data Leaks

Real-Time PII Prevention: Stopping AI Data Leaks Before They Happen.

The Scale of the Problem

Why Detection Falls Short

Prevention at the Point of Entry

Law Firm Case Study

GDPR Records for Compliance Teams

Conclusion

When This Approach Has Limits

Sources

Related Articles

Real-Time PII Prevention Saves $2.2M

GDPR Art. 32: AI Tools PII Monitoring

GDPR Support AI: Custom Identifiers

Ready to protect your data?

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow