Updated for 2026

Most teams check their database for personal info. Fewer do the same for their log system.

GDPR Article 5(1)(e) limits how long you can store personal info. For databases, teams set policies and run deletion jobs. For log files, the rule is simpler: keep everything for 90 days for debugging.

The problem? Those records hold personal info. Request entries hold user emails. Error captures hold raw input values. Access entries hold IP addresses. Each of these counts as personal info under GDPR. Your team needs a lawful basis and a retention plan for each one.

What Ends Up in Your Log Files

Standard web app logging pulls in a wide range of PII.

Access records (nginx/Apache):

IP addresses — personal info per EDPB guidance
User-agent strings — may enable device fingerprinting
Session tokens — if written to output

App records (structured JSON):

User IDs and email addresses
Input errors — often include the raw invalid value, which may be real user info
Business events — order IDs linked to customer accounts
Search queries — may contain names or addresses

API gateway records:

Auth headers — partly captured in some setups
Query params — may carry user IDs, names, or emails
Request and response bodies — present in debug-level setups

Database audit entries:

SQL queries with WHERE clauses like email = 'user@example.com'
Literal personal values in query params

This is not done on purpose. It is a side effect of logging built for debugging, not GDPR.

EDPB Guidance on IP Addresses

The European Data Protection Board says IP addresses are personal info. ISPs can link them to subscribers. Within an org, they can identify specific users.

The impact is direct. Access records with IP addresses are personal records. Keeping nginx output for 12 months means keeping personal info for 12 months. That needs a lawful basis under Article 6. It also needs the retention period to match your stated purpose.

Most teams skip this step. "We keep entries for 90 days because security says so" is a rule of thumb. It is not a GDPR Article 5(1)(e) review. See our Legal Compliance overview for how this fits a broader program.

How to Reach Compliance

The practical route for most teams is not to cut retention windows. Operational and security reasons for longer windows are real. The better path is to mask records before long-term storage.

A tiered model works well.

0–7 days: Full raw records for active debugging. Seven days is short enough for most teams.

7–90 days: Masked records for trend analysis and security review. IP addresses are swapped out. User emails become stable tokens. Account numbers are masked. Key fields — timestamps, error codes, latency, endpoints — are kept as-is.

90+ days (if needed): Aggregated output only. Event counts, error rates, latency ranges. No user-level records remain.

Personal info stops at seven days. Aggregated output can carry forward without exposing anyone. See Security & Compliance for more detail.

Keep Structure Intact for Monitoring

Good masking keeps the JSON structure intact. It only swaps out content. This keeps output useful for debugging and alerts.

Kept as-is:

JSON keys and nesting
Timestamps and time order
Error types and HTTP status codes
HTTP methods, paths, and latency values
Business event types

Swapped out:

Email addresses → stable token per original (e.g. user1@example.com)
IP addresses → RFC 5737 ranges (192.0.2.x)
Account numbers → ACCT_XXXXX
Phone numbers → +XX XXX XXX XXXX
Names in error text → [PERSON]

Stable tokens keep traces useful. A trace for user1@example.com across 40 entries works the same as the original. Aggregated metrics — error rates, latency, throughput — need no personal info at all. See the Glossary for the terms pseudonymization and anonymization.

Three Ways to Integrate This

Three patterns cover most engineering teams.

Option 1 — Pipeline masking: Fluentd or Logstash intercepts each line before sending it on. A masking step runs inline. Elastic or Datadog gets only cleaned records. No app code changes are needed.

Option 2 — Nightly batch: Raw records land in local storage. A nightly job masks the prior day's output and deletes the raw version. Masked records go to long-term storage. Raw output is kept for seven days only.

Option 3 — Pre-share masking: Raw records stay internal with strict access controls. Before sharing with pen testers or outside contractors, run a masking pass. External parties always get clean versions.

For GDPR docs, masking is a "technical measure" under Article 32. Record the tool, its setup, and your retention policy in your Records of Processing Activities (RoPA) under Article 30. See our FAQ for common RoPA questions.

Want a real-world example? Check the case studies for concrete implementation details. You can also review our pricing to see which plan includes built-in masking pipelines.

When This Approach Has Limits

Tiered retention with structure-preserving masking is a sound way to keep logs useful while shrinking GDPR exposure, but limits remain worth stating plainly.

Masking only removes what detection recognizes. A masking pass swaps emails, IPs, and account numbers when the engine identifies them, but free-text error messages, stack traces, and arbitrary request bodies carry personal data in shapes no pattern reliably catches. A name pasted into a search query or a passport number inside a serialized payload can survive the pass untouched. The residual false-negative rate sets the floor on your exposure. Test the masking against real production samples, including your messiest debug-level entries, rather than against tidy example records.

Stable tokens keep traces useful and keep the data in legal scope. Mapping user1@example.com to one consistent token across forty entries is what makes masked logs debuggable, but a stable token is reversible by anyone who holds the mapping, and a re-used pseudonym still links a person's behavior across requests. That is pseudonymization, not anonymization: the masked logs remain personal data under GDPR, and you have relocated the risk into custody of the token map. Protect that mapping with the same controls you would apply to the raw logs.

Masking is a technical measure, not a compliance verdict. Recording the tool and tiers in your RoPA documents an Article 32 safeguard, but it does not by itself establish a lawful basis, a justified retention period, or that ninety days is actually necessary. Those determinations require a human assessment of purpose against Article 5(1)(e). The tool shrinks the exposure window; it does not decide what the window should be or whether your processing was lawful in the first place.

Sources

GDPR Article 5: Principles for Data Processing — VERIFIED-EXTERNAL
EDPB Opinion 5/2019 on ePrivacy Directive and GDPR — VERIFIED-EXTERNAL
Sonra.io: PII Masking in JSON and XML Data — VERIFIED-EXTERNAL

Limitations / When this doesn't apply

Masking only removes what detection recognizes. A masking pass swaps emails, IPs, and account numbers when the engine identifies them, but free-text error messages, stack traces, and arbitrary request bodies carry personal data in shapes no pattern reliably catches — a name pasted into a search query or a passport number inside a serialized payload can survive untouched. The residual false-negative rate sets the floor on your exposure, so test against real production samples including your messiest debug-level entries.

Stable tokens keep traces useful and keep the data in legal scope. Mapping one email to a consistent token across forty entries is what makes masked logs debuggable, but a stable token is reversible by anyone holding the mapping and still links a person's behavior across requests — that is pseudonymization, not anonymization, so the logs remain personal data and the risk moves into custody of the token map. Protect that mapping like the raw logs. And masking is a technical measure, not a compliance verdict: recording it in your RoPA documents an Article 32 safeguard, but it does not establish a lawful basis or justify the retention period. Whether ninety days is necessary is a human assessment of purpose against Article 5(1)(e).

Ready to protect your data?

Start anonymizing PII with 267+ entity types across 48 languages.

Start Free Trial View Features

GDPR in App Logs: JSON PII Compliance

What Ends Up in Your Log Files

EDPB Guidance on IP Addresses

How to Reach Compliance

Keep Structure Intact for Monitoring

Three Ways to Integrate This

When This Approach Has Limits

Sources

Limitations / When this doesn't apply

Related Articles

Presidio: 3-Week Setup vs Managed PII

6 Weeks to 3 Days: Managed PII Setup

Free PII Detection Costs €13K/Year

Ready to protect your data?

GDPR in App Logs: JSON PII Compliance

The Silent GDPR Risk in Your Log Stack

What Ends Up in Your Log Files

EDPB Guidance on IP Addresses

How to Reach Compliance

Keep Structure Intact for Monitoring

Three Ways to Integrate This

When This Approach Has Limits

Sources

Limitations / When this doesn't apply

Related Articles

Presidio: 3-Week Setup vs Managed PII

6 Weeks to 3 Days: Managed PII Setup

Free PII Detection Costs €13K/Year

Ready to protect your data?

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow