Real-Time PII Prevention: Stopping AI Data Leaks Before They Happen.
Updated for 2026.
In March 2023, a Samsung engineer pasted source code into ChatGPT. The code left Samsung's control at once. No tool caught it in time. Post-hoc security controls cannot stop AI data leaks. This one event proved it.
Detection tools tell you what happened after the fact. Log checks, endpoint DLP, and audit logs all work this way. For AI leaks, after the fact is too late. The data has already reached the AI model.
The Scale of the Problem
A 2025 Cyberhaven study looked at how firms use AI. The findings were striking.
- 11% of all ChatGPT prompts contain private or sensitive data.
- The average worker uses AI tools 14 times per day.
- High-use staff interact 30 to 50 times daily.
- At 11%, that means 3 to 5 sensitive sends per worker per day.
At a firm with 500 high-use workers, this adds up to 2,000-plus sensitive sends per day. Each one can be a GDPR Article 83 breach. The risk is not just legal. Trust and reputation are also at stake.
Common types of sensitive content in AI prompts include the following.
- Customer names and contact details.
- Account numbers and payment records.
- Medical notes from health workers.
- Case details from lawyers.
- Staff review notes from HR teams.
- Internal revenue or sales projections.
The study does not split intentional from accidental sharing. Both create the same legal risk. A worker who forgets to remove a client name causes the same breach as one who ignores the rule. Intent does not change the outcome.
Why Detection Falls Short
Network checks cannot read HTTPS traffic without TLS blocking. TLS blocking adds overhead and raises privacy concerns. Modern browsers often reject it.
Endpoint DLP agents watch clipboard and keystroke input. But they have lag. By the time an agent flags a pattern, the prompt may be sent already.
Vendor audit logs record what was shared after it was shared. They help with response. They do not stop leaks.
Staff training is a policy, not a control. The Cyberhaven study shows 11% of prompts still contain sensitive content at firms with clear policies. Training does not stop accidental sharing or mid-task lapses.
Blocking AI tools removes output gains. Workers then use personal devices or accounts. That places work outside any oversight.
None of these methods stop sensitive content from reaching AI systems in real time.
Prevention at the Point of Entry
The only safe defense is masking before the prompt is sent. A customer name replaced with [PERSON_1] before it leaves the browser is never seen by the AI model.
Here is how inline masking works.
- A worker types a customer email into Claude or ChatGPT.
- The browser add-on detects personal data in real time.
- Entities are marked with type labels: PERSON, EMAIL_ADDRESS, ACCOUNT_NUMBER.
- The worker reviews the marked items.
- One click swaps all entities for tokens.
- The masked prompt is sent.
The AI gets a prompt like this: "Customer [PERSON_1] at [EMAIL_1] has account [ACCOUNT_1]."
The AI handles the request. It never sees real names or numbers. The worker knows the actual customer from context.
This approach has clear benefits.
- Personal data stays out of external AI systems.
- Customer details are not added to AI training sets.
- Workers keep access to AI tools. Output stays high.
It does not stop deliberate sharing if a worker bypasses the tool. File uploads need a separate workflow. No control is perfect. But inline masking removes the accidental group. That group makes up most incidents. The result is a large drop in risk with no change to the daily workflow.
Law Firm Case Study
A law firm's staff used Claude to draft contract notes. Their method: copy contract sections, paste into Claude, request a summary.
Before Chrome Extension use — first 6 months:
- 3 client data incidents found during the review.
- Each incident: a client name plus a matter reference number appeared in the prompt.
- All 3 were accidental.
After Chrome Extension use — next 6 months:
- Zero client data incidents.
- Staff received real-time alerts when pasting sections with client names.
- One click replaced "Johnson Controls Matter 2024-0347" with "[PERSON_1] Matter [REFERENCE_1]."
- The method stayed the same.
The managing partner said: "Our staff knew the policy before the add-on. The add-on made compliance the easy path."
See how other firms handled this in our case studies. Review controls in the security overview.
GDPR Records for Compliance Teams
Firms using browser-based AI masking must document it as a technical control.
Records of Processing (ROPA): State that AI prompts pass through client-side masking before reaching vendors. List the entity types, the engine version, and deploy logs as evidence.
Data processor deals: When no personal data reaches the AI vendor, DPA duties are simple. The personal data you hold never leaves your system.
Audit logs: Add-on logs capture entity count per session, mask rate, and entity types by volume. These metrics feed into compliance reports.
Review GDPR rules for AI tools in our legal compliance guide and glossary. Common questions are in our FAQ.
Conclusion
The Samsung incident showed that AI leaks happen faster than any post-hoc control can act. The Cyberhaven study put a number on it: 11% of prompts, many times per worker, every day.
Real-time masking before sending fixes the root cause. When personal data never reaches the AI, there is nothing to detect, log, or clean up. Workers keep their AI tools. Firms keep their compliance status.
Detection tells you when prevention failed. For AI data leaks, the cost of failure — fines, harm to reputation, loss of trust — justifies prevention first.
Explore pricing for your firm. Read our founder statement on why prevention-first is our core design principle.
Sources
- Cyberhaven: AI Data Exposure Study 2025 — cyberhaven.com.
- Samsung ChatGPT Data Breach, March 2023 — Bloomberg.
- GDPR Articles 4 and 32: Personal data and technical measures — gdpr-info.eu.