Concept · Policy engine

Policy engine

An evaluation tells you a score. A policy tells you what to dowith that score. The policy engine is the control plane between the two — a declarative rule set that maps “safety dropped below 0.85 on a customer-support response” to a specific action like “block + log + page on-call”.

Why a separate engine and not just thresholds

Inline thresholds (one number per endpoint call) work for a single product. Production teams have many products, each with different risk profiles and different regulators. A flat threshold: 0.8can’t express:

“Safety must be ≥ 0.9 for healthcare, but ≥ 0.7 for internal”.
“Block if PII leakage detected, regardless of overall score”.
“Auto-redact emails in the output if > 3 detected, else log only”.
“Pass without re-evaluating if the response is a refusal”.
“Notify the on-call channel if 5 fails in 60s on the same user”.

The policy engine encodes all of the above as rules attached to projects, applied at evaluation time. The eval endpoint stays simple (returns scores); the policy engine decides what those scores trigger.

Rule shape

A rule pairs a condition (a serializable JSON operator tree) with an action. The condition language is the shared policy expression engine in packages/core/src/policy: keys are either dot-separated field paths (resolved against the eval result — e.g. safety, metadata.tier) or the logical operators $and / $or / $not. Each field carries Mongo/Portkey-style $-operators ($lt, $gt, $eq, $in, …). No eval() — conditions compile to a typed AST and evaluate as a pure, side-effect-free walk (ADR-0021).

rule

{
  "name": "Healthcare hard gate",
  "priority": 10,
  "condition": {
    "$or": [
      { "safety":   { "$lt": 0.9  } },
      { "accuracy": { "$lt": 0.85 } }
    ]
  },
  "action": {
    "type": "block",
    "message": "Healthcare safety/accuracy minimum"
  }
}

Action types

The firewall rule-builder action vocabulary is block / allow / transform / alert (RuleAction.type in packages/core/src/firewall/rule-builder.ts):

block — the response is rejected. A gateway handler maps this to an HTTP error; an SDK wrapper maps it to a thrown EvalGuardError.
allow — let the response through unchanged, but record the match. Use for monitor-only rules.
transform— rewrite the response in place via the rule’s transformTemplate(e.g. redact a matched span, replace with a refusal string). Composes with the firewall’s inline redact path.
alert — emit an alert at the rule’s alertSeverity (critical / high / medium / low) for downstream notification routing.

For richer response remediation, the firewall also exposes the OnFailAction vocabulary (packages/core/src/firewall/on-fail-action.ts): block, allow, redact, refrain, reask, fix, warn, and custom (a caller-provided handler, e.g. route to a manual-review queue).

Scope + precedence

Rules attach to a project, so the project a request runs under selects the rule set. Two per-rule fields then control where and in what order a rule fires:

priority — a number; rules evaluate in ascending order (lower number first).
regions — optional jurisdiction allow-list (groups like EU/US, ISO-3166 codes like DE/IN, or GLOBAL). A rule with regionsset is only evaluated when the request’s resolved jurisdiction intersects the list. Omit it to apply in every region.

Rules are evaluated in priority order. With short-circuit enabled, evaluation stops at the first matching block rule — so ordering is load-bearing. The policy editor in the dashboard surfaces the eval order explicitly to prevent surprise.

Where policies attach

Policies are wired into three call paths:

safe-regenerate — the per-call threshold / maxIterationsgate scores after each iteration’s eval.
The gateway proxy (/api/v1/gateway/proxy/...) — the agent-policy engine evaluates each tool call inline on every wrapped LLM request.
The firewall (/api/v1/firewall/check) — fires on every prompt and response. See firewall vs scorer for the latency tradeoff.

Related concepts

Scoring thresholds — the per-dim numbers your policy rules compare against.
Agent checkpoints — policies attached to the three agent gates.
Firewall vs scorer — which call path your policy belongs in.