Firewall p95: 2.57msreproducible, on real hardware.
Published numbers, reproducible methodology, no marketing claims without measurements behind them.
2.57 ms p95 across 20,000 runs(~19× faster than the published Lakera <50ms claim)
Where the time goes
Per-layer breakdown
Each request runs through up to four layers. The slowest single layer dominates the budget.
Apples to apples
Published competitor claims
Where vendors publish a number, we list it. Where they don't, we say so.
How we measured
Methodology
- 10 sample promptscovering benign queries, prompt injection ("ignore all previous instructions"), DAN jailbreaks, PII (SSN), system-prompt-leak attempts, base64-encoded payloads, and translations.
- 200-iteration JIT warm-upbefore measurement so the V8 hot path doesn't poison the p50.
- Single thread, no concurrency. Throughput under concurrency would be higher; we publish the conservative number.
- All four enabled layers: pattern + token + semantic + output. The semantic layer alone takes ~95% of the budget — pattern, token, output are sub-millisecond.
- Source not compiled JS. The benchmark imports
packages/core/src/firewall/detection-engine.tsviatsx. Production runs against the bundled JS path, which is typically faster.
Run it yourself
Reproduce this on your hardware
Clone the public repo, run one command, get the same JSON shape.
# Public repo git clone https://github.com/EvalGuardAi/evalguard cd evalguard pnpm install # Default run — 5,000 iterations, plain stdout node scripts/benchmark-firewall-latency.mjs # JSON output (same shape this page reads) node scripts/benchmark-firewall-latency.mjs --runs=20000 --json > latency.json # Or via the npm script pnpm bench:firewall-latency
CI fails on p95 regression past 50ms — see .github/workflows/ci.yml.
Chain of custody
Provenance
Numbers are refreshed on every release that touches the firewall path. CI gate prevents regressions.
The latency budget here is a code-level measurement. Per-tier SLA + uptime history lives on the SLA page.