By · Last updated 2026-05-29

Back to BlogTechnical

6 Weeks to 3 Days: Managed PII Setup

Healthcare SaaS teams spend 6 weeks on self-hosted Presidio production deployment before switching to managed API. The managed API replaces the deployment.

May 29, 20267 minute read
managed PII APIPresidio productionPHI anonymizationhealthcare SaaSbuild vs buy

From Six Weeks of DevOps Pain to a 3-Day Integration

Updated for 2026.

Six weeks. Two engineers. Four failed deployment attempts. One healthcare SaaS team spent all of this on a self-hosted Presidio setup. Then they switched to a managed API. The switch took 3 days.

The "free" label on open-source software is tempting. So is the promise of full control. But the real cost shows up in engineering hours. Not license fees.

What Presidio Docs Don't Cover

Presidio's docs handle local setup well. Run two Docker containers. Point the anonymizer at the analyzer. It works on your laptop.

Production is a different story.

Scaling: Local Presidio runs as a single instance. Production needs multiple instances behind a load balancer, health checks, and graceful failure. Presidio docs give no guidance on this. Each team solves it alone.

Memory use: spaCy models load into RAM per instance. The en_core_web_lg model alone is 741 MB. Under memory pressure, performance drops. Then the process crashes with an out-of-memory error. Presidio has no built-in guidance for this.

Timeouts: Large documents take longer. Production code needs configurable timeouts, safe timeout responses, and retry logic. None of this is documented in Presidio.

Model load failures: Under high concurrency, multiple workers try to load the same spaCy model at once. This is a race condition. The result is random 500 errors that are hard to reproduce. Presidio GitHub issues document this. The main docs do not.

Audit logs: GDPR and HIPAA require audit trails for PII processing. Presidio has no built-in logging. Each team must write their own middleware.

API versioning: Presidio's API has changed between versions. Code built for Presidio 2.0 may need updates for 2.2 and above. Version pinning helps. But it adds its own maintenance burden.

A Healthcare SaaS Team's Six Weeks

This team built PHI anonymization into a research data export pipeline.

Week 1: They followed the Presidio docs. Local dev worked. The Kubernetes deployment failed. Pod initialization threw model loading errors. The team chased Kubernetes config issues.

Week 2: Kubernetes config was fixed. Model loading worked sometimes. Under load testing, about 15% of requests failed with model loading timeouts. They added retry logic.

Week 3: Retry logic hid the root issue but passed load tests. A compliance review asked for audit logs. The team wrote custom logging middleware.

Week 4: Healthcare entity types — medical record numbers, health plan IDs — were not covered by Presidio defaults. The team wrote two custom recognizers.

Week 5: They pushed to production. A memory leak appeared. spaCy model objects built up across requests. The team added a daily pod restart as a workaround.

Week 6: Production failed under real traffic. The daily restart caused service gaps. The root cause was clear: the memory leak needed either a major app redesign or a different tool.

The review: The engineering manager ran the numbers. Six weeks times two engineers equals 12 engineering weeks. The deployment was live but unstable. Ongoing maintenance was estimated at 5 to 10 hours per week.

The switch: The team tested the anonym.legal API. PHI entity coverage worked out of the box. No custom recognizers needed. SLA-backed uptime. Audit logging included. Integration took 3 days using their existing API client code.

The cost comparison:

  • 12 engineering weeks at US market rates: $48,000 to $72,000
  • Estimated annual maintenance for self-hosted: $25,000 to $40,000
  • anonym.legal Business plan: €348 per year (roughly $385)

The managed API costs less in its first week than the self-hosted build cost in its first hour.

When Data Cannot Leave Your Network

Some healthcare teams cannot send data to any external service. Air-gap rules or data sovereignty policies block it.

For these cases, the Desktop Application (anonym.plus) offers the same engine in a local install:

  • Same detection engine: Presidio plus XLM-RoBERTa
  • No calls to external services
  • Batch processing for clinical notes and research datasets
  • No setup beyond installation
  • Automatic model management

This removes the main objection to managed SaaS: "our data can't leave." It still keeps the simplicity that makes managed tools worthwhile.

Build vs. Buy: A Simple Framework

Choose a managed API when:

  • Your team has no dedicated infrastructure engineers
  • You need to ship in days, not weeks
  • SLA-backed uptime is a requirement
  • The managed service covers your entity types
  • You need audit logs and compliance records included

Choose self-hosted when:

  • Regulations block data from leaving your network (check the Desktop App first)
  • Your processing volume makes self-hosted cheaper at scale
  • You need deep customization the API cannot support
  • You have a platform team that treats this as one of many managed services

Choose the Desktop Application when:

  • Offline processing is required
  • Medical research data cannot leave a clinical environment
  • Financial data has geographic processing limits

Conclusion

Six weeks of engineering time is not a Presidio flaw. It is the expected cost of running any production-grade NLP service on your own. Scaling, memory issues, model load failures, audit logs, and custom entity work all add up fast.

Managed APIs absorb that cost. For PII anonymization — a compliance need, not a product feature — the managed route almost always wins on total cost of ownership.

Read how the anonym.legal API handles PHI detection. See full compliance details in our security overview. Compare plans on our pricing page.

Sources

  • Ploomber: Presidio Production Deployment Deep Dive — ploomber.io.
  • Microsoft Fabric Community: Presidio with PySpark — blog.fabric.microsoft.com.
  • Presidio GitHub: Production Deployment Issues — github.com/microsoft/presidio/issues.

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.