By · Last updated 2026-05-29

Back to BlogTechnical

Free PII Detection Costs €13K/Year

Self-hosting Presidio requires 40-80 hours initial setup and 5-10 hours/month ongoing maintenance. At €100/hour engineering rates, that's €13,200+.

May 29, 20267 minute read
Presidio TCOopen-source costmanaged SaaSPII infrastructureDevOps cost

The Real Cost of "Free" PII Detection

"It's free" is not a cost analysis. It is a license price — one factor among many.

Microsoft Presidio costs €0 to download. The software is open-source. But running it at an insurance company costs over €13,000 in the first year. That gap is engineering time.

What a Production Deployment Needs

Getting the tool ready for production takes 40–80 hours. Here is where that time goes.

Docker setup: 4–8 hours. The tool uses several containers. An analyzer service, an anonymizer service, and an optional image redactor. Getting them to talk to each other is hard. GitHub issues show it is a common failure point.

Python setup: 2–4 hours. The libraries have strict version rules. Conflicts are common — especially between spaCy model versions and Python 3.8/3.9/3.10. GitHub shows hundreds of open issues on this topic.

Language model downloads: 2–4 hours. spaCy models range from 300 MB to 1.4 GB each. A five-language setup needs 1.5–7 GB of storage. Model loading failures are among the most common support issues.

Custom recognizers: 8–16 hours. The default set covers about 40 entity types. Most are US identifiers. EU deployments need European national IDs. Healthcare teams need medical record formats. Each type needs Python code, YAML setup, and testing.

API setup: 4–8 hours. Production config includes timeouts, auth, rate limits, and logging. The official docs are thin. Most teams find answers in GitHub issue threads.

Audit logging: 4–8 hours. GDPR requires records of data processing. The tool has no audit log by default. Teams must write it as custom code.

Team docs: 4–8 hours.

Total initial setup: 28–52 hours at €100/hour = €2,800–5,200.

Annual Maintenance Costs

The tool ships updates 2–4 times per year. Major releases have broken APIs. Keeping up means tracking changes, testing in staging, and deploying.

spaCy model updates add work too. New model versions need re-downloading and accuracy checks before going live.

Python dependency conflicts keep coming. A clean setup today may break when a security patch ships next month.

Monitoring is ongoing as well. Container health, memory leaks, and restart steps all need regular attention. spaCy models are memory-heavy.

Total annual maintenance: 60–120 hours at €100/hour = €6,000–12,000.

A Real-World Case Study

A compliance team at an insurance firm set out to process claims documents. They had two junior data engineers and no DevOps support.

Week 1. The two main containers could not talk to each other. Three days to fix with help from GitHub.

Week 2. Models failed to load in production. Memory config was different from the dev setup. Two days to diagnose, one more to fix.

Week 3. A custom UK National Insurance Number rule worked in tests but hit false positives on real documents. Two more days of tuning.

Week 4. The project was escalated. Three engineering weeks spent. Still not in production.

The team then tried anonym.legal. First document processed: 12 minutes after signup. UK National Insurance Number detection was already built in. No setup needed.

They moved to anonym.legal Pro at €180/year.

Year-one TCO:

  • Self-hosted path — 40–80 more hours to finish, then €6,000–12,000/year to maintain. Total: €10,000–20,000.
  • anonym.legal Pro — €180/year. Deploy time: ~12 minutes.
  • Engineering hours saved: ~132/year at €100/hour = €13,200.

That is a 70x cost gap in year one.

For teams also facing false positive issues, see our post on Presidio's precision problem.

When Self-Hosting Makes Sense

Managed SaaS wins for most teams. But self-hosting fits some cases.

Data sovereignty. Some rules or contracts ban sending data outside. Our Desktop App (anonym.plus) runs fully offline. No data leaves the machine. Same accuracy, no server needed.

Very high volume. Millions of API calls per day can push per-call pricing above server costs. At that scale, owning the stack makes sense.

Product integration. Building PII detection into your own product and need full control? Custom open-source work is valid here.

Existing DevOps. Teams with a platform team already running many services face lower added cost. Infrastructure is a sunk cost for them.

For everyone else — compliance teams, startups, teams with no DevOps — managed SaaS is the clear choice. See our security compliance overview for how hosted processing meets enterprise needs.

Conclusion

Open-source tools have costs that do not show up in the license. For this type of tool, the big cost is engineering time. Setup: 40–80 hours. Annual upkeep: 60–120 hours. At normal rates, the self-hosted path costs 20–75x more than a managed service.

The right question is not "what does the software cost?" It is "what does running it cost?" For most teams, that answer points to managed SaaS.

Sources

Microsoft Presidio GitHub: Issues and Setup Documentation. VERIFIED-EXTERNAL.

Ploomber: Presidio Production Deployment Guide. VERIFIED-EXTERNAL.

GDPR Article 32: Technical measures for appropriate security. VERIFIED-EXTERNAL.

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.