Firewall Latency Benchmark

Published numbers, reproducible methodology, no marketing claims without measurements behind them.

Headline result

2.57 ms p95 across 20,000 runs(~19× faster than the published Lakera <50ms claim)

p50

2.11 ms

p95

2.57 ms

SLA target 50ms

p99

3.19 ms

max

10.58 ms

worst single request

mean

2.01 ms

throughput

498 req/s

single thread

total runs

20,000

Per-layer breakdown

Each request runs through up to four layers. The slowest single layer dominates the budget.

Layer

What it does

p50

p95

p99

pattern

Regex-based prompt injection / DAN / jailbreak / system-override patterns. Compiled at boot, no per-call overhead.

0.11

0.16

0.25

token

DLP token scanning — 200 patterns covering SSN, credit card, AWS keys, API keys, etc. Linear scan, short-circuited on first hit.

0.02

0.03

0.04

semantic

Semantic similarity scoring against an embedding-based attack corpus. Dominates the latency budget — single ML inference.

1.94

2.36

2.94

output

Output-side scan only runs after the model responds, so it doesn't add to the request-path latency reported here.

0.00

Published competitor claims

Where vendors publish a number, we list it. Where they don't, we say so.

Vendor

Latency claim

Source

Lakera

Sub-50ms runtime latency

www.lakera.ai

Troj.ai

Real-time runtime protection (no specific number)

troj.ai

Helicone

Not published

—

Patronus

Not published

—

EvalGuard

2.57 ms p95 measured

This page · reproducible below

Methodology

10 sample prompts covering benign queries, prompt injection ("ignore all previous instructions"), DAN jailbreaks, PII (SSN), system-prompt-leak attempts, base64-encoded payloads, and translations.
200-iteration JIT warm-up before measurement so the V8 hot path doesn't poison the p50.
Single thread, no concurrency. Throughput under concurrency would be higher; we publish the conservative number.
All four enabled layers: pattern + token + semantic + output. The semantic layer alone takes ~95% of the budget — pattern, token, output are sub-millisecond.
Source not compiled JS. The benchmark importspackages/core/src/firewall/detection-engine.tsvia tsx. Production runs against the bundled JS path, which is typically faster.

Reproduce this on your hardware

Clone the public repo, run one command, get the same JSON shape.

# Public repo
git clone https://github.com/EvalGuardAi/evalguard
cd evalguard
pnpm install

# Default run — 5,000 iterations, plain stdout
node scripts/benchmark-firewall-latency.mjs

# JSON output (same shape this page reads)
node scripts/benchmark-firewall-latency.mjs --runs=20000 --json > latency.json

# Or via the npm script
pnpm bench:firewall-latency

CI fails on p95 regression past 50ms — see .github/workflows/ci.yml.

Provenance

Measured at

April 30, 2026 · 04:53:59 GMT UTC

Source

scripts/benchmark-firewall-latency.mjs

JSON

/trust/latency.json

Numbers are refreshed on every release that touches the firewall path. CI gate prevents regressions.

Want full SLA + uptime guarantees?

The latency budget here is a code-level measurement. Per-tier SLA + uptime history lives on the SLA page.

View SLA