Anthropic

anthropic

Aliases: claude

YAML config

providers:
  - id: anthropic:claude-3-5-sonnet-latest
    config:
      apiKey: ${ANTHROPIC_API_KEY}

TypeScript usage

import { createProvider } from "@evalguard/core";

const provider = createProvider("anthropic", process.env.ANTHROPIC_API_KEY);
const response = await provider.complete({
  model: "claude-3-5-sonnet-latest",
  messages: [{ role: "user", content: "Hello" }],
});

Authentication

Set ANTHROPIC_API_KEY in your environment. EvalGuard validates the key on first call and surfaces typed errors for 401 / 403 / rate-limit responses (with Retry-After parsing).

Setup walkthrough

  1. 1. Generate an API key at https://console.anthropic.com/settings/keys.
  2. 2. Set in env: `export ANTHROPIC_API_KEY=sk-ant-...`.
  3. 3. (Optional) Store via EvalGuard's BYOK vault for prod.
  4. 4. Configure: `providers: [{id: 'anthropic:claude-3-5-sonnet-latest', config: {apiKey: '${ANTHROPIC_API_KEY}'}}]`.
  5. 5. Smoke test: `evalguard run --provider anthropic:claude-3-5-sonnet-latest --prompt 'Hello'`.

Gotchas

  • Anthropic uses a top-level `system` field instead of system messages in the array — EvalGuard's provider handles this automatically, but if you're hand-rolling the request, don't put system in messages[].
  • The `max_tokens` field is REQUIRED on every request (unlike OpenAI's default). EvalGuard defaults to 1024 if not specified — raise for long-form generation.
  • Prompt caching uses the `anthropic-beta: prompt-caching-2024-07-31` header (EvalGuard sends it automatically). Add `cache_control: {type: 'ephemeral'}` to specific content blocks to opt them into caching.
  • 529 (overloaded) is a distinct error from 429 (rate limit) — EvalGuard treats both as retry-eligible but Anthropic's 529 means the whole service is busy, not just your tier.

Cost note

claude-3-5-sonnet: $3/M input, $15/M output. claude-haiku-4.5: $0.80/M input, $4/M output. Prompt caching gives 90% discount on cached prefix (highest of any provider) — design your prompt structure accordingly.

Recommended models

Eval / judge
claude-haiku-4-5-latest — fast + cheap for high-volume scoring
Agent / tool-use
claude-3-5-sonnet-latest — strongest tool-use + reasoning
Code
claude-3-5-sonnet-latest — best at large-codebase reasoning
Vision
claude-3-5-sonnet-latest — strong document understanding

Hand-written · 2026-05-21