Migrating from Promptfoo
EvalGuard is compatible with Promptfoo config files. Most migrations take under 5 minutes.
Quick Start
1. Point eval:local at your existing config
npx @evalguard/cli eval:local promptfooconfig.yaml
EvalGuard reads Promptfoo YAML configs directly. Standard assert types are mapped to EvalGuard scorers automatically — e.g. llm-rubric → llm-grader, model-graded-fact → factuality, is-json → json-valid, similar → semantic-similarity.
2. Or convert to an explicit evalguard.yaml
npx @evalguard/cli import:promptfoo promptfooconfig.yaml
Writes a fully-converted evalguard.yaml you can review and commit, and prints a summary of any assertion types that need manual attention. The hand-written config looks like:
# evalguard.yaml
model: gpt-4o-mini
prompt: "Answer: {{input}}"
scorers:
- contains
- relevance
cases:
- input: "What is 2+2?"
expectedOutput: "4"
- input: "Capital of France?"
expectedOutput: "Paris"Config Mapping
| Promptfoo | EvalGuard | Note |
|---|---|---|
| promptfooconfig.yaml | evalguard.yaml | Same YAML shape — eval:local reads Promptfoo configs directly |
| providers: | model: | Single model field instead of providers array (first provider is used) |
| tests: | cases: | Either spelling works |
| assert: [{ type: 'contains' }] | scorers: ['contains'] | Per-test assertions become scorers — most types map automatically |
| assert: [{ type: 'llm-rubric' }] | scorers: ['llm-grader'] | Assertion-type names are mapped to EvalGuard scorer names automatically |
| assert: [{ type: 'is-json' }] | scorers: ['json-valid'] | model-graded-fact → factuality, similar → semantic-similarity, etc. |
| npx promptfoo eval | npx @evalguard/cli eval:local | Same workflow, different CLI |
| npx promptfoo generate redteam | npx @evalguard/cli scan:local | Built-in red team with 250+ attack plugins |
Most assert types map 1:1, but the names differ — EvalGuard handles the rename for you when you run eval:local or import:promptfoo.
Assertions that need manual handling
A few Promptfoo assertion types run arbitrary code or use reference-overlap metrics that don't have a built-in EvalGuard scorer. eval:local skips these with a warning (it never silently drops them) and import:promptfoolists them in its summary. Here's what to use instead:
| Promptfoo assertion | EvalGuard equivalent |
|---|---|
| javascript / python | Write a custom scorer — see /docs/scorers/custom |
| webhook | Use the EvalGuard webhook post-eval action |
| rouge | Use semantic-similarity (closest match) or embedding-similarity |
| bleu | Use semantic-similarity |
What you gain by switching
Ready to migrate?
Start with your existing Promptfoo config. No rewrite needed.