Concept · Policy engine
Policy engine
An evaluation tells you a score. A policy tells you what to do with that score. The policy engine is the control plane between the two — a declarative rule set that maps "safety dropped below 0.85 on a customer-support response" to a specific action like "block + log + page on-call".
Why a separate engine and not just thresholds
Inline thresholds (one number per endpoint call) work for a single product. Production teams have many products, each with different risk profiles and different regulators. A flat threshold: 0.8 can't express:
- "Safety must be ≥ 0.9 for healthcare, but ≥ 0.7 for internal".
- "Block if PII leakage detected, regardless of overall score".
- "Auto-redact emails in the output if > 3 detected, else log only".
- "Pass without re-evaluating if the response is a refusal".
- "Notify the on-call channel if 5 fails in 60s on the same user".
The policy engine encodes all of the above as rules attached to projects, applied at evaluation time. The eval endpoint stays simple (returns scores); the policy engine decides what those scores trigger.
Rule shape
Rules are stored in policies with a JSON spec roughly:
{
"name": "Healthcare hard gate",
"scope": { "project_id": "...", "endpoint": "safe-regenerate" },
"conditions": [
{ "dim": "safety", "operator": "<", "value": 0.9 },
{ "dim": "accuracy", "operator": "<", "value": 0.85 }
],
"match": "any", // any | all
"actions": [
{ "type": "block", "reason": "Healthcare safety/accuracy minimum" },
{ "type": "alert", "channel": "pagerduty", "severity": "P2" },
{ "type": "audit", "kind": "policy_block" }
]
}Action types
- block — response is dropped, the API returns 403 with the policy name. Useful for hard-fail traffic.
- regenerate — kick the response into the regeneration loop with custom thresholds (overrides the inline ones).
- redact — apply PII / secret-leak / custom-regex redactions before returning. Composes with the firewall's inline redact path.
- alert — fire-and-forget notification to Slack, PagerDuty, Discord, Teams, or any webhook. Includes the policy name, the response excerpt, and the failing dimension(s).
- audit — write a typed row to
audit_logswith the policy id and the matched conditions. Required for SOC 2 / ISO 42001 evidence trails. - tag — attach a metadata tag to the response envelope so downstream consumers (your app) can branch on it without re-evaluating.
Scope + precedence
A rule's scope selects what traffic it applies to:
project_id— restrict to a single project (most common).endpoint— restrict to a specific API surface (e.g. onlysafe-regenerate, notevaluate).environment— production vs staging.tags— match on request-level tags (e.g.user-tier: premium).
Rules are evaluated in priority order (highest first). The first rule whose conditions match wins; its actions run; subsequent rules are skipped unless explicitly marked continue. This makes rule ordering load-bearing — the policy editor in the dashboard surfaces the eval order explicitly to prevent surprise.
Where policies attach
Policies are wired into three call paths:
safe-regenerate— fires after each iteration's eval, can override threshold / maxIterations / actions.- The gateway proxy (
/api/v1/gateway) — fires inline on every wrapped LLM response. - The firewall (
/api/v1/firewall) — fires on every prompt and response. See firewall vs scorer for the latency tradeoff.
Related concepts
- Scoring thresholds — the per-dim numbers your policy rules compare against.
- Agent checkpoints — policies attached to the three agent gates.
- Firewall vs scorer — which call path your policy belongs in.