Back to Trust

Firewall Latency Benchmark

Published numbers, reproducible methodology, no marketing claims without measurements behind them.

Headline result

2.57 ms p95 across 20,000 runs(~19× faster than the published Lakera <50ms claim)

p50
2.11 ms
p95
2.57 ms
SLA target 50ms
p99
3.19 ms
max
10.58 ms
worst single request
mean
2.01 ms
throughput
498 req/s
single thread
total runs
20,000

Per-layer breakdown

Each request runs through up to four layers. The slowest single layer dominates the budget.

Layer
What it does
p50
p95
p99
pattern
Regex-based prompt injection / DAN / jailbreak / system-override patterns. Compiled at boot, no per-call overhead.
0.11
0.16
0.25
token
DLP token scanning — 200 patterns covering SSN, credit card, AWS keys, API keys, etc. Linear scan, short-circuited on first hit.
0.02
0.03
0.04
semantic
Semantic similarity scoring against an embedding-based attack corpus. Dominates the latency budget — single ML inference.
1.94
2.36
2.94
output
Output-side scan only runs after the model responds, so it doesn't add to the request-path latency reported here.
0.00
0.00
0.00

Published competitor claims

Where vendors publish a number, we list it. Where they don't, we say so.

Vendor
Latency claim
Source
Lakera
Sub-50ms runtime latency
Troj.ai
Real-time runtime protection (no specific number)
Helicone
Not published
Patronus
Not published
EvalGuard
2.57 ms p95 measured
This page · reproducible below

Methodology

  • 10 sample prompts covering benign queries, prompt injection ("ignore all previous instructions"), DAN jailbreaks, PII (SSN), system-prompt-leak attempts, base64-encoded payloads, and translations.
  • 200-iteration JIT warm-up before measurement so the V8 hot path doesn't poison the p50.
  • Single thread, no concurrency. Throughput under concurrency would be higher; we publish the conservative number.
  • All four enabled layers: pattern + token + semantic + output. The semantic layer alone takes ~95% of the budget — pattern, token, output are sub-millisecond.
  • Source not compiled JS. The benchmark importspackages/core/src/firewall/detection-engine.tsvia tsx. Production runs against the bundled JS path, which is typically faster.

Reproduce this on your hardware

Clone the public repo, run one command, get the same JSON shape.

# Public repo
git clone https://github.com/EvalGuardAi/evalguard
cd evalguard
pnpm install

# Default run — 5,000 iterations, plain stdout
node scripts/benchmark-firewall-latency.mjs

# JSON output (same shape this page reads)
node scripts/benchmark-firewall-latency.mjs --runs=20000 --json > latency.json

# Or via the npm script
pnpm bench:firewall-latency

CI fails on p95 regression past 50ms — see .github/workflows/ci.yml.

Provenance

Measured at
April 30, 2026 · 04:53:59 GMT UTC
Source
scripts/benchmark-firewall-latency.mjs
JSON

Numbers are refreshed on every release that touches the firewall path. CI gate prevents regressions.

Want full SLA + uptime guarantees?

The latency budget here is a code-level measurement. Per-tier SLA + uptime history lives on the SLA page.

View SLA
Firewall Latency Benchmark — EvalGuard | EvalGuard™