Incident Response Plan

How we detect, respond to, and recover from security incidents and service disruptions.

Last updated: April 2026

Severity Classification

P0 — Critical
Response: 15 minResolution: 4 hours

Data breach, complete service outage, active exploitation

P1 — High
Response: 1 hourResolution: 24 hours

Partial service degradation, security vulnerability discovered

P2 — Medium
Response: 4 hoursResolution: 72 hours

Non-critical service issues, performance degradation

P3 — Low
Response: 24 hoursResolution: Best effort

Cosmetic issues, non-urgent improvements

Response Process

1. Detection & Triage

  • Automated monitoring alerts (health checks every 30s)
  • User reports via support tickets or [email protected]
  • Sentry error tracking for application crashes
  • Classify severity level (P0-P3)
  • Assign incident commander

2. Containment

  • Isolate affected systems to prevent spread
  • Revoke compromised credentials immediately
  • Enable maintenance mode if needed
  • Preserve evidence and logs for investigation
  • Notify affected customers within 72 hours (GDPR requirement)

3. Investigation

  • Root cause analysis with full audit log review
  • Identify scope of impact (affected users, data, services)
  • Document timeline of events
  • Assess whether data was accessed, modified, or exfiltrated
  • Engage external forensics if needed

4. Recovery

  • Deploy fix and verify in staging first
  • Gradual rollout to production with monitoring
  • Restore from backups if data integrity is compromised
  • Verify all systems are operational
  • Remove any maintenance mode restrictions

5. Post-Incident

  • Publish post-mortem within 5 business days
  • Update status page with incident timeline
  • Implement preventive measures
  • Update runbooks and monitoring based on lessons learned
  • Brief affected customers with final resolution report

Customer Communication

During an incident, we communicate through:

  • Status page — Real-time updates at evalguard.ai/status
  • Email — Direct notification to affected account owners
  • In-app banner — Dashboard notification for active incidents
  • Post-mortem — Published within 5 business days of resolution

Preventive Measures

Automated Monitoring

Health checks every 30s, Docker container monitoring every 5 min

Daily Backups

Automated pg_dump at 2 AM with weekly full backup and verification

Security Scanning

CodeQL, TruffleHog, Trivy on every deployment via CI/CD

Rate Limiting

Per-route rate limiting with Redis-backed distributed tracking

Audit Logging

Tamper-proof logs with HMAC-SHA256 signing for all mutations

Encryption

AES-256-GCM at rest, TLS 1.2+ in transit, BYOK support

Report an Incident

Security incidents: [email protected]

Service disruptions: [email protected]

Status page: evalguard.ai/status

Disclosure policy: evalguard.ai/security/responsible-disclosure

Incident Response Plan — EvalGuard | EvalGuard