Skip to content
Firewall latency benchmark

Firewall p95: 2.57msreproducible, on real hardware.

Published numbers, reproducible methodology, no marketing claims without measurements behind them.

Headline result

2.57 ms p95 across 20,000 runs(~19× faster than the published Lakera <50ms claim)

p50
2.11 ms
p95
2.57 ms
SLA target 50ms
p99
3.19 ms
max
10.58 ms
worst single request
mean
2.01 ms
throughput
498 req/s
single thread
total runs
20,000

Where the time goes

Per-layer breakdown

Each request runs through up to four layers. The slowest single layer dominates the budget.

Layer
What it does
p50
p95
p99
pattern
Regex-based prompt injection / DAN / jailbreak / system-override patterns. Compiled at boot, no per-call overhead.
0.11
0.16
0.25
token
DLP token scanning — 200 patterns covering SSN, credit card, AWS keys, API keys, etc. Linear scan, short-circuited on first hit.
0.02
0.03
0.04
semantic
Semantic similarity scoring against an embedding-based attack corpus. Dominates the latency budget — single ML inference.
1.94
2.36
2.94
output
Output-side scan only runs after the model responds, so it doesn't add to the request-path latency reported here.
0.00
0.00
0.00

Apples to apples

Published competitor claims

Where vendors publish a number, we list it. Where they don't, we say so.

Vendor
Latency claim
Source
Lakera
Sub-50ms runtime latency
Troj.ai
Real-time runtime protection (no specific number)
Helicone
Not published
Patronus
Not published
EvalGuard
2.57 ms p95 measured
This page · reproducible below

How we measured

Methodology

  • 10 sample promptscovering benign queries, prompt injection ("ignore all previous instructions"), DAN jailbreaks, PII (SSN), system-prompt-leak attempts, base64-encoded payloads, and translations.
  • 200-iteration JIT warm-upbefore measurement so the V8 hot path doesn't poison the p50.
  • Single thread, no concurrency. Throughput under concurrency would be higher; we publish the conservative number.
  • All four enabled layers: pattern + token + semantic + output. The semantic layer alone takes ~95% of the budget — pattern, token, output are sub-millisecond.
  • Source not compiled JS. The benchmark importspackages/core/src/firewall/detection-engine.tsvia tsx. Production runs against the bundled JS path, which is typically faster.

Run it yourself

Reproduce this on your hardware

Clone the public repo, run one command, get the same JSON shape.

# Public repo
git clone https://github.com/EvalGuardAi/evalguard
cd evalguard
pnpm install

# Default run — 5,000 iterations, plain stdout
node scripts/benchmark-firewall-latency.mjs

# JSON output (same shape this page reads)
node scripts/benchmark-firewall-latency.mjs --runs=20000 --json > latency.json

# Or via the npm script
pnpm bench:firewall-latency

CI fails on p95 regression past 50ms — see .github/workflows/ci.yml.

Chain of custody

Provenance

Measured at
April 30, 2026 · 04:53:59 GMT UTC
Source
scripts/benchmark-firewall-latency.mjs

Numbers are refreshed on every release that touches the firewall path. CI gate prevents regressions.

Want full SLA + uptime guarantees?

The latency budget here is a code-level measurement. Per-tier SLA + uptime history lives on the SLA page.

View SLA