Concept · Regeneration loop

The regeneration loop

A failing evaluation is information, not a verdict. The regeneration loop turns “this response is biased” into “ask the model to fix the bias and try again.” It’s the difference between a guardrail that blocks (frustrating to the user) and a guardrail that repairs (invisible to the user, dramatic for the operator’s eval pass-rate curve).

The loop, step by step

Evaluate the candidate response against the chosen scorer set. Each dim returns a 0–1 score; overall = MIN.
If overall ≥ threshold → stop, return passed.
Otherwise → derive a structured improvement instructionfrom the failing dimensions (e.g. “the response is gender-biased; remove protected-class generalisations and re-write neutrally”).
Call the regenerator LLM with the original prompt + the original response + the improvement instruction. Get a new candidate.
Re-evaluate. Loop back to step 2. Bounded by maxIterations (default 3, max 10).

Stop conditions

passed— overall score ≥ threshold. The “happy” exit.
max_iterations — iteration budget exhausted without passing. The route returns the highest-scoring iteration so far, so you get the best attempt even if no iteration cleared the bar.
no_improvement— consecutive iterations produced monotonically non-improving scores. We stop early rather than burn credits on a flat curve. Saves real money on inputs the model just can’t fix.

Cost-budget gate

Every call accepts maxCostUsd. Before the loop starts, the route computes the worst-case cost (max iterations × per-iteration eval + regen tokens × per-token pricing for the chosen model) and refuses with HTTP 402 if it would exceed the cap. This protects against an operator misconfiguring the loop to runaway credit spend on a flood of failing inputs.

There is no default cap. maxCostUsd is opt-in — if you omit it (or set it to 0), no budget gate is applied and the loop runs unbounded by cost (the default for BYOK callers who already cap spend at the provider level). To enable the pre-flight gate, pass a positive maxCostUsd (USD, range 0–100); the route rejects with HTTP 402 BUDGET_EXCEEDED when the worst-case estimate exceeds it.

The audit row

Every call writes one row to safe_regenerate_runs with:

The first 4,000 chars of the prompt and the original content (truncation, not censorship — full text lives in audit_logs for forensic recall).
The threshold, max-iterations, dimension subset, scorer set used.
The full iteration_scores[] array (one number per iteration), the chosen final_content_excerpt, the final_score, the stop_reason.
iteration_history JSONB — opt-in (set recordHistory: trueon the request) — captures the per-iteration content + improvement instructions for “show me how the model improved this response” UIs.
The real cost: total_credits_consumed + estimated_cost_usd reconciled against the cost ledger.

Operators can answer “did the loop actually improve outcomes?” without running an offline replay. That’s the audit-grade bar this table was designed for.

Real example

Live-fire test on 2026-05-24 sent a textbook gender-biased prompt to POST /api/v1/evals/safe-regenerate. The deep grader returned fairness=0.80 exactly at the threshold — barely passing — and the loop terminated at iteration 0 with stop_reason=passed. Result: roughly 3 LLM calls, a few thousand tokens, and sub-cent real spend recorded.

Tighter calibration (see scoring thresholds) would have failed that response and triggered a regenerate; the loop is calibration-sensitive, which is the right tradeoff.

Related concepts

Evaluation modes — what runs before the loop decides to retry.
Scoring thresholds — what the threshold means.
Policy engine — declarative alternative to the inline threshold + maxIterations flags.