Logging & Alerting Failures (OWASP A09:2025)

What is OWASP A09?

Security Logging and Alerting Failures sits at number nine in the OWASP Top 10:2025, but the position understates how much damage it enables. This category isn't about an attacker exploiting your app directly — it's about everything that happens after they've already gotten in. When your logging is broken, you lose the ability to detect, respond to, and recover from attacks. You also lose the forensic trail you'd need to understand what was taken.

The problem is deceptively simple: most applications log too little of the right things, or too much of the wrong things, and almost none of them have alerting that would wake anyone up at 2am when something bad is happening. Combine that with log data that nobody actually reads, and you have the conditions that allowed breaches like the Equifax hack (143 million records, undetected for 78 days) and the Capital One breach (100 million records, undetected for months) to unfold at the scale they did.

194 Average days attackers dwell in systems before detection (IBM Cost of Data Breach 2024)

33% Of breaches discovered by external parties — not the victim organization's own monitoring

4.9M Average cost of a breach in USD when detection takes more than 200 days (IBM 2024)

Those numbers aren't abstract. A 194-day dwell time means an attacker spent roughly six months inside the network, exfiltrating data, moving laterally, and setting up persistence — all while your monitoring showed nothing unusual. The gap between detection and reality is the definition of a logging failure.

What attackers do with missing logs

Understanding how adversaries exploit poor logging makes the problem concrete. When your application isn't logging security-relevant events, attackers gain three huge advantages: they can probe freely without triggering alerts, they can cover their tracks without worrying about audit trails, and they have time — lots of it.

Credential stuffing at scale

Credential stuffing attacks work by trying username/password pairs from breached datasets against your login endpoint. A properly instrumented application would see the spike — maybe 50,000 login attempts from 3,000 IP addresses over 2 hours — and fire an alert. Without it, the attacker systematically validates credentials, finds the ones that work, and accesses those accounts. You find out when a user emails support saying their account was compromised.

Slow enumeration attacks

Not all attacks are noisy. Experienced attackers know to stay below rate limits. They'll send one request per minute to an IDOR-vulnerable endpoint, slowly enumerating user records over days. No individual request looks suspicious. Without a system that correlates behavior over time and across sessions, the attack is invisible. Only an audit log with enough context — which user, which resource, which IP, which timestamp — lets you reconstruct the full picture later.

Post-compromise lateral movement

Once an attacker has a foothold, they start moving. They'll access admin functionality they shouldn't have access to, export user lists, query database schemas, and exfiltrate files. In a well-logged application, each of these actions generates a security event. In a poorly logged one, the attacker moves through your application leaving no trace. By the time you find out, there's nothing to investigate.

Business impact

Without adequate logging and monitoring, you cannot detect an active intrusion, cannot scope a breach after the fact, and cannot satisfy incident response requirements for compliance frameworks like PCI DSS, SOC 2, ISO 27001, and HIPAA. The secondary costs — regulatory fines, legal discovery, customer notification, reputation damage — often exceed the direct breach costs by an order of magnitude.

PCI DSS 4.0 Requirement 10 mandates audit logging of all system access and explicitly requires automated alerting for anomalous behavior. HIPAA requires audit controls on systems that touch electronic protected health information. A logging failure isn't just a security gap — it's a compliance gap with direct liability attached.

What you should be logging

The starting principle is simple: log events that have security relevance. Not every HTTP request (that's access logging and it's different), but the specific moments where something could go wrong. Here's a practical breakdown:

Authentication events

Successful logins — user ID, IP address, timestamp, user agent, authentication method
Failed login attempts — same fields, plus the failure reason (invalid password, account locked, etc.)
Logouts — especially forced logouts and session invalidation
Password changes and resets — who changed what, when, from where
MFA events — enrollment, successful verification, bypass attempts, backup code usage
Account lockouts — triggered by N failed attempts
OAuth / SSO flows — token issuance, delegation, third-party access grants

Authorization events

Access denied — user, resource, action, reason. These are gold for detecting reconnaissance and privilege escalation attempts.
Role changes — who granted what privilege to whom, when
Admin actions — any operation performed through an administrative interface
Cross-user data access — especially for multi-tenant applications where user A accessing user B's data is suspicious

Data events

Mass data exports — queries or API calls returning more than N records
Sensitive resource access — reads of PII, financial records, health data, API keys
Data modification and deletion — especially bulk operations
File uploads — filename, MIME type, size, storage location

Application health events

Input validation failures — rejected inputs that look like injection attempts
Rate limit breaches — which endpoint, which client, how many requests
Token validation failures — expired, invalid signature, suspicious JWT claims
Dependency failures — database unavailable, downstream service errors
Configuration changes — runtime config updates, feature flag changes, environment modifications

Every security event log entry should contain at minimum: timestamp (UTC, with milliseconds), event type, outcome (success/failure), user identity (or "anonymous"), IP address, and a resource identifier. A log entry without a timestamp is useless for forensics. A log entry without an IP is nearly as bad.

Good security log entry — structured JSON

{
  "timestamp": "2025-03-15T14:23:41.882Z",
  "event_type": "auth.login_failed",
  "outcome": "failure",
  "reason": "invalid_password",
  "user_email": "stan@example.com",
  "ip_address": "91.108.4.77",
  "user_agent": "Mozilla/5.0 ...",
  "session_id": "sess_01hx...",
  "request_id": "req_01hx...",
  "service": "auth-api",
  "environment": "production"
}

What you should NOT log

This is where a lot of teams go wrong in the other direction. Overly verbose logging is its own security failure — if your logs contain sensitive data, you've essentially created a second database of sensitive data with weaker access controls and a much longer retention period than the original.

Never log these: plaintext passwords (even "wrong" ones — they're often reused), full credit card numbers, Social Security Numbers, authentication tokens or API keys, session cookies, secret questions/answers, medical record data, unmasked bank account numbers, or any full PII that isn't strictly necessary for the security event context.

The GDPR compliance angle makes this especially important. Logging a user's email address with every event is probably justified. Logging their full address, phone number, date of birth, and medical history in a debug log is a data breach waiting to happen — except the breach is your own logging infrastructure.

Bad — logs sensitive data

Login failed for user:
  email: stan@example.com
  password: MyP@ssw0rd123
  card: 4111111111111111
  ssn: 123-45-6789

Good — context without exposure

Login failed for user:
  user_id: usr_01hx...
  email: s***@example.com
  ip: 91.108.4.77
  attempt_count: 3

When you need to log something that contains sensitive data — say, an access denied event where the resource name contains a user ID — use references, not raw values. Log the resource ID, not the full URL. Log the user_id, not the email. Log "card ending in 4242", not the full PAN.

Log injection attacks

Here's a threat that most logging guides skip: attackers can tamper with your log files by injecting malicious content into values that get logged. If your application logs user-supplied input without sanitization, an attacker can craft input that creates fake log entries, hides their activity, or exploits vulnerabilities in log analysis tools.

Classic log injection uses newlines to split a single log entry into multiple lines, then adds fake "normal" entries to hide malicious ones. More dangerous variants target log analysis tools directly — if your SIEM or log parser processes logged data as code, you have a second injection vulnerability hiding inside your logging infrastructure.

Vulnerable — raw user input in log

User input: "admin\n2025-03-15
auth.login_success user=admin
ip=91.108.4.77"

→ Creates fake success entry
  in log file

Safe — encode before logging

// Encode newlines, tabs,
// and control characters
const safe = input
  .replace(/\n/g, '\\n')
  .replace(/\r/g, '\\r')
  .replace(/\t/g, '\\t');
logger.warn({ input: safe });

The fix is straightforward: never concatenate user input directly into log strings. Use structured logging (JSON) so each field is clearly delimited. Encode or escape special characters in any value derived from user input before it reaches the log. If you use a logging library, make sure it does this for you — most modern ones do when used correctly.

Insufficient alerting — the other half of the problem

Logging without alerting is like installing security cameras with no one watching the feed. You'll have evidence after the fact, but you won't stop anything while it's happening. The gap between "we have logs" and "we have effective detection" is where most organizations fall short.

What you need alerts for

Brute force threshold — 5+ failed login attempts from the same IP within 5 minutes
Account takeover patterns — login success after N failures, new IP for established account, login from unusual geography
Privilege escalation — any user accessing admin functionality they haven't previously used
Mass data extraction — single request returning more than 1,000 records, or total daily API calls for a user 10x their average
Authentication anomalies — expired token usage, algorithm confusion attempts on JWTs, invalid session IDs that don't match any known pattern
WAF / input filter triggers — spike in blocked requests suggesting active scanning
Error rate spikes — sudden increase in 403/404/500 responses from the same source

Alert fatigue is a real threat

There's a failure mode in the opposite direction: too many alerts, too many false positives, and the security team stops paying attention. Alert fatigue is how breaches get missed even when the detection systems technically fired. The Uber breach in 2022 involved attackers who were initially blocked by MFA — they just kept trying until the employee accepted the push notification out of frustration. The alerts fired; they were ignored.

Good alerting is tuned to your specific application's normal behavior. What's a "suspicious" number of API calls depends on whether you're a B2C app with millions of users or a B2B tool with 50 power users. Set baselines first, then set thresholds relative to those baselines. Prioritize quality over quantity: one actionable alert is worth more than fifty noisy ones.

SIEM and centralized monitoring

A Security Information and Event Management (SIEM) system collects, normalizes, and correlates log data across your entire infrastructure. For applications beyond a certain scale or compliance requirement, a SIEM isn't optional — it's the only practical way to maintain visibility across services, correlate events that span multiple systems, and meet audit requirements.

Popular options include Splunk, Elastic SIEM, Microsoft Sentinel, Datadog Security Monitoring, and open-source stacks built on the Elastic or Grafana Loki ecosystem. Each has different cost models and operational tradeoffs, but the key capabilities you need are the same:

Real-time ingestion from all application services
Normalized event schema so you can write cross-service queries
Rule-based alerting with configurable thresholds
Long-term retention (90 days minimum for most compliance requirements, 12 months for PCI DSS)
Immutability or tamper-evidence — logs that attackers can delete are useless
Access controls on the log data itself

One architectural point that matters: logs should be written to a separate system, not the same database your application uses. An attacker who compromises your application database shouldn't be able to delete the evidence. Write logs to an append-only log store, ship them offsite, and give the application service account write-only access — not read or delete.

Python — structured security logging with python-json-logger

# pip install python-json-logger
import logging
from pythonjsonlogger import jsonlogger
from datetime import datetime, timezone

# Configure structured JSON logging
handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
    fmt='%(asctime)s %(name)s %(levelname)s %(message)s',
    datefmt='%Y-%m-%dT%H:%M:%S'
)
handler.setFormatter(formatter)

security_logger = logging.getLogger('security')
security_logger.addHandler(handler)
security_logger.setLevel(logging.INFO)

# Log a login failure with full context
def log_login_failure(email: str, ip: str, reason: str, attempt_count: int):
    security_logger.warning(
        "login_failed",
        extra={
            "event_type": "auth.login_failed",
            "outcome": "failure",
            "reason": reason,
            # Mask email — keep domain for triage, hide local part
            "user_email_masked": mask_email(email),
            "ip_address": ip,
            "attempt_count": attempt_count,
            "timestamp": datetime.now(timezone.utc).isoformat(),
        }
    )

def mask_email(email: str) -> str:
    local, _, domain = email.partition('@')
    return f"{local[0]}***@{domain}"

Detection gap and dwell time — the real cost

The statistic that should be on every engineering team's wall: the average attacker dwell time before detection is around 194 days according to IBM's 2024 Cost of a Data Breach Report. That's not the time to fully investigate and remediate — that's just the time until someone first notices something is wrong. The full remediation cycle adds another 73 days on average.

What's happening during those 194 days? The attacker is establishing persistence, moving laterally, identifying valuable data stores, exfiltrating slowly to avoid detection, and potentially preparing destructive payloads for when they're eventually caught. Every day of undetected access is more damage done.

The organizations with the shortest dwell times share common characteristics: they have mature logging infrastructure with good coverage, they've invested in detection rules tuned to their environment, they run regular log reviews and have automated anomaly detection, and they have incident response procedures practiced enough that when an alert fires, people know exactly what to do next.

Reducing dwell time from 194 days to 30 days translates directly to cost savings. IBM's data shows that breaches identified within 200 days cost on average $1.1M less than those identified later. Logging and monitoring investments have a clear ROI calculation attached to them.

Practical checklist to audit your logging

Here's a condensed checklist you can walk through right now:

Log authentication events — every login attempt (success and failure) with IP, user, and timestamp
Log authorization failures — every access denied, including the resource and the user who tried
Log admin actions — every operation through admin interfaces with who did it and when
Use structured logging — JSON over plaintext, so logs are queryable by field
Include correlation IDs — request IDs that let you trace a single request across microservices
Sanitize user input before logging — encode newlines and control characters
Never log credentials, tokens, or full PII — use IDs and masked values instead
Ship logs to immutable storage — separate from your application database, write-only access from app
Set retention policy — at minimum 90 days online, 12 months archived
Write alert rules for brute force, mass extraction, and privilege escalation
Test your alerts — run a controlled credential stuffing simulation against staging and verify the alert fires
Review logs regularly — automated anomaly detection plus weekly manual review of high-severity events

Audit your security logging

Our scanner checks your application's logging posture — exposed debug endpoints, verbose error messages leaking stack traces, missing security headers, and indicators of insufficient monitoring configuration.

Run a free scan

Common mistakes teams make

After reviewing a lot of applications, a few patterns come up over and over:

Logging only errors, not security events. Application error logs and security audit logs are different things. A 500 error is worth logging, but it tells you nothing about authentication attempts, access control decisions, or data access patterns. Security events need their own log stream with their own retention and alerting rules.

Timestamps without timezones. "2025-03-15 14:23:41" is ambiguous. Is that UTC? Server local time? The server in which data center? When you're correlating logs across services during an incident response at 3am, ambiguous timestamps slow everything down. Always use UTC, always include the timezone marker, and always include millisecond precision.

Logging to the application database. This is especially common in smaller applications. The temptation is understandable — you already have a database, tables are easy to query, it's familiar. But it's wrong. An attacker who can execute SQL can delete your audit trail. Log to a separate service designed for append-only writes.

No one actually reads the logs. Even teams with technically correct logging often have no process for reviewing them. Logs that nobody reads provide forensic evidence after the fact but zero detection capability in real time. If you're not running automated detection rules, schedule regular manual reviews. Something is better than nothing.

Testing only the happy path. Most application testing validates that features work correctly. Security testing needs to also validate that failure modes are logged correctly. Try to log in with a wrong password and verify the event appears in your log store. Attempt to access a forbidden resource and verify the access denied event fires. Run a mock brute force and verify your rate limiting alert triggers. Log events that work in theory but are misconfigured in production are common and invisible without this kind of testing.

Security Logging & Alerting Failures