The Silent GDPR Risk in Your Log Stack
Updated for 2026
Most teams check their database for personal info. Fewer do the same for their log system.
GDPR Article 5(1)(e) limits how long you can store personal info. For databases, teams set policies and run deletion jobs. For log files, the rule is simpler: keep everything for 90 days for debugging.
The problem? Those records hold personal info. Request entries hold user emails. Error captures hold raw input values. Access entries hold IP addresses. Each of these counts as personal info under GDPR. Your team needs a lawful basis and a retention plan for each one.
What Ends Up in Your Log Files
Standard web app logging pulls in a wide range of PII.
Access records (nginx/Apache):
- IP addresses — personal info per EDPB guidance
- User-agent strings — may enable device fingerprinting
- Session tokens — if written to output
App records (structured JSON):
- User IDs and email addresses
- Input errors — often include the raw invalid value, which may be real user info
- Business events — order IDs linked to customer accounts
- Search queries — may contain names or addresses
API gateway records:
- Auth headers — partly captured in some setups
- Query params — may carry user IDs, names, or emails
- Request and response bodies — present in debug-level setups
Database audit entries:
- SQL queries with WHERE clauses like
email = 'user@example.com' - Literal personal values in query params
This is not done on purpose. It is a side effect of logging built for debugging, not GDPR.
EDPB Guidance on IP Addresses
The European Data Protection Board says IP addresses are personal info. ISPs can link them to subscribers. Within an org, they can identify specific users.
The impact is direct. Access records with IP addresses are personal records. Keeping nginx output for 12 months means keeping personal info for 12 months. That needs a lawful basis under Article 6. It also needs the retention period to match your stated purpose.
Most teams skip this step. "We keep entries for 90 days because security says so" is a rule of thumb. It is not a GDPR Article 5(1)(e) review. See our Legal Compliance overview for how this fits a broader program.
How to Reach Compliance
The practical route for most teams is not to cut retention windows. Operational and security reasons for longer windows are real. The better path is to mask records before long-term storage.
A tiered model works well.
0–7 days: Full raw records for active debugging. Seven days is short enough for most teams.
7–90 days: Masked records for trend analysis and security review. IP addresses are swapped out. User emails become stable tokens. Account numbers are masked. Key fields — timestamps, error codes, latency, endpoints — are kept as-is.
90+ days (if needed): Aggregated output only. Event counts, error rates, latency ranges. No user-level records remain.
Personal info stops at seven days. Aggregated output can carry forward without exposing anyone. See Security & Compliance for more detail.
Keep Structure Intact for Monitoring
Good masking keeps the JSON structure intact. It only swaps out content. This keeps output useful for debugging and alerts.
Kept as-is:
- JSON keys and nesting
- Timestamps and time order
- Error types and HTTP status codes
- HTTP methods, paths, and latency values
- Business event types
Swapped out:
- Email addresses → stable token per original (e.g.
user1@example.com) - IP addresses → RFC 5737 ranges (
192.0.2.x) - Account numbers →
ACCT_XXXXX - Phone numbers →
+XX XXX XXX XXXX - Names in error text →
[PERSON]
Stable tokens keep traces useful. A trace for user1@example.com across 40 entries works the same as the original. Aggregated metrics — error rates, latency, throughput — need no personal info at all. See the Glossary for the terms pseudonymization and anonymization.
Three Ways to Integrate This
Three patterns cover most engineering teams.
Option 1 — Pipeline masking: Fluentd or Logstash intercepts each line before sending it on. A masking step runs inline. Elastic or Datadog gets only cleaned records. No app code changes are needed.
Option 2 — Nightly batch: Raw records land in local storage. A nightly job masks the prior day's output and deletes the raw version. Masked records go to long-term storage. Raw output is kept for seven days only.
Option 3 — Pre-share masking: Raw records stay internal with strict access controls. Before sharing with pen testers or outside contractors, run a masking pass. External parties always get clean versions.
For GDPR docs, masking is a "technical measure" under Article 32. Record the tool, its setup, and your retention policy in your Records of Processing Activities (RoPA) under Article 30. See our FAQ for common RoPA questions.
Want a real-world example? Check the case studies for concrete implementation details. You can also review our pricing to see which plan includes built-in masking pipelines.
Sources
- GDPR Article 5: Principles for Data Processing — VERIFIED-EXTERNAL
- EDPB Opinion 5/2019 on ePrivacy Directive and GDPR — VERIFIED-EXTERNAL
- Sonra.io: PII Masking in JSON and XML Data — VERIFIED-EXTERNAL