The Real Cost of "Free" PII Detection
"It's free" is not a cost analysis. It is a license price — one factor among many.
Microsoft Presidio costs €0 to download. The software is open-source. But running it at an insurance company costs over €13,000 in the first year. That gap is engineering time.
What a Production Deployment Needs
Getting the tool ready for production takes 40–80 hours. Here is where that time goes.
Docker setup: 4–8 hours. The tool uses several containers. An analyzer service, an anonymizer service, and an optional image redactor. Getting them to talk to each other is hard. GitHub issues show it is a common failure point.
Python setup: 2–4 hours. The libraries have strict version rules. Conflicts are common — especially between spaCy model versions and Python 3.8/3.9/3.10. GitHub shows hundreds of open issues on this topic.
Language model downloads: 2–4 hours. spaCy models range from 300 MB to 1.4 GB each. A five-language setup needs 1.5–7 GB of storage. Model loading failures are among the most common support issues.
Custom recognizers: 8–16 hours. The default set covers about 40 entity types. Most are US identifiers. EU deployments need European national IDs. Healthcare teams need medical record formats. Each type needs Python code, YAML setup, and testing.
API setup: 4–8 hours. Production config includes timeouts, auth, rate limits, and logging. The official docs are thin. Most teams find answers in GitHub issue threads.
Audit logging: 4–8 hours. GDPR requires records of data processing. The tool has no audit log by default. Teams must write it as custom code.
Team docs: 4–8 hours.
Total initial setup: 28–52 hours at €100/hour = €2,800–5,200.
Annual Maintenance Costs
The tool ships updates 2–4 times per year. Major releases have broken APIs. Keeping up means tracking changes, testing in staging, and deploying.
spaCy model updates add work too. New model versions need re-downloading and accuracy checks before going live.
Python dependency conflicts keep coming. A clean setup today may break when a security patch ships next month.
Monitoring is ongoing as well. Container health, memory leaks, and restart steps all need regular attention. spaCy models are memory-heavy.
Total annual maintenance: 60–120 hours at €100/hour = €6,000–12,000.
A Real-World Case Study
A compliance team at an insurance firm set out to process claims documents. They had two junior data engineers and no DevOps support.
Week 1. The two main containers could not talk to each other. Three days to fix with help from GitHub.
Week 2. Models failed to load in production. Memory config was different from the dev setup. Two days to diagnose, one more to fix.
Week 3. A custom UK National Insurance Number rule worked in tests but hit false positives on real documents. Two more days of tuning.
Week 4. The project was escalated. Three engineering weeks spent. Still not in production.
The team then tried anonym.legal. First document processed: 12 minutes after signup. UK National Insurance Number detection was already built in. No setup needed.
They moved to anonym.legal Pro at €180/year.
Year-one TCO:
- Self-hosted path — 40–80 more hours to finish, then €6,000–12,000/year to maintain. Total: €10,000–20,000.
- anonym.legal Pro — €180/year. Deploy time: ~12 minutes.
- Engineering hours saved: ~132/year at €100/hour = €13,200.
That is a 70x cost gap in year one.
For teams also facing false positive issues, see our post on Presidio's precision problem.
When Self-Hosting Makes Sense
Managed SaaS wins for most teams. But self-hosting fits some cases.
Data sovereignty. Some rules or contracts ban sending data outside. Our Desktop App (anonym.plus) runs fully offline. No data leaves the machine. Same accuracy, no server needed.
Very high volume. Millions of API calls per day can push per-call pricing above server costs. At that scale, owning the stack makes sense.
Product integration. Building PII detection into your own product and need full control? Custom open-source work is valid here.
Existing DevOps. Teams with a platform team already running many services face lower added cost. Infrastructure is a sunk cost for them.
For everyone else — compliance teams, startups, teams with no DevOps — managed SaaS is the clear choice. See our security compliance overview for how hosted processing meets enterprise needs.
Conclusion
Open-source tools have costs that do not show up in the license. For this type of tool, the big cost is engineering time. Setup: 40–80 hours. Annual upkeep: 60–120 hours. At normal rates, the self-hosted path costs 20–75x more than a managed service.
The right question is not "what does the software cost?" It is "what does running it cost?" For most teams, that answer points to managed SaaS.
Sources
Microsoft Presidio GitHub: Issues and Setup Documentation. VERIFIED-EXTERNAL.
Ploomber: Presidio Production Deployment Guide. VERIFIED-EXTERNAL.
GDPR Article 32: Technical measures for appropriate security. VERIFIED-EXTERNAL.