CLI Reference

16 commands for running evaluations, security scans, and managing your AI testing workflow from the terminal.

Installation

terminal

npm install -g @evalguard/cli

After installing, the evalguard command will be available globally. Or run without installing via npx @evalguard/cli <command>.

evalguard login

Authenticate with your EvalGuard API key

Usage

bash

evalguard login --key <apiKey> [--url <baseUrl>]

Options

Option	Description
--key <apiKey>	API key (or set EVALGUARD_API_KEY env var)
--url <baseUrl>	Custom API base URL

Example

terminal

evalguard login --key eg_live_abc123def456...

evalguard logout

Remove stored credentials

Usage

bash

evalguard logout

Example

terminal

evalguard logout

evalguard init

Initialize a new EvalGuard project (zero-signup quickstart). Creates evalguard.yaml, tests/hello-world.yaml, and prompts/system.txt.

Usage

bash

evalguard init [-t <template>] [--local] [--ci] [-y]

Options

Option	Description
-t, --template <name>	Starter template: hello-world \| chatbot \| rag \| security-scan \| agent
--local	Configure for local-only eval (no API key needed)
--ci	Scaffold .env.example + a GitHub Actions workflow
-y, --yes	Skip interactive prompts

Example

terminal

evalguard init --local

evalguard eval

Run an evaluation from a config file via the cloud API. Requires authentication.

Usage

bash

evalguard eval <file> [options]

Options

Option	Description
--project <projectId>	Override project ID
--model <model>	Override model
--wait	Wait for completion and show results

Example

terminal

evalguard eval evals/qa-test.json --model gpt-4o --wait

evalguard scan

Run a security scan from a config file via the cloud API. Requires authentication.

Usage

bash

evalguard scan <file> [options]

Options

Option	Description
--project <projectId>	Override project ID
--model <model>	Override model
--wait	Wait for completion and show results

Example

terminal

evalguard scan scans/red-team.json --wait

evalguard whoami

Show current authentication status, masked API key, and configured project.

Usage

bash

evalguard whoami

Example

terminal

evalguard whoami

evalguard eval:local

Run evaluation locally using @evalguard/sdk. No API key needed. Runs entirely on your machine.

Usage

bash

evalguard eval:local <file> [options]

Options

Option	Description
--model <model>	Override model
--provider <provider>	Override provider (openai, anthropic, etc.)
--output <format>	Output format: json, csv, html, or file path
--verbose	Show detailed output per test case

Example

terminal

evalguard eval:local evals/my-eval.json --model gpt-4o-mini --verbose

evalguard scan:local

Run red team security scan locally using @evalguard/sdk. No API key needed.

Usage

bash

evalguard scan:local <file> [options]

Options

Option	Description
--model <model>	Override model
--provider <provider>	Override provider
--output <format>	Output format: json or file path
--verbose	Show each finding

Example

terminal

evalguard scan:local scans/pentest.json --provider anthropic --verbose

evalguard investigate

One-shot triage on a single AI-safety event — OTLP span, firewall block, or LLM output JSON. Auto-detects event shape, runs the relevant deep graders, emits a structured triage report with evidence-backed findings.

Usage

bash

evalguard investigate -i <event.json> [options]

Options

Option	Description
-i, --input <path>	Path to event JSON file (or '-' for stdin)
--format <format>	Output: json \| pretty \| markdown (default: pretty)
--effort <tier>	Reasoning depth: low \| medium \| high \| xhigh \| max (default: medium)

Example

terminal

evalguard investigate -i alert.json --format markdown --effort high

evalguard generate tests

Generate synthetic test cases from a description using an LLM.

Usage

bash

evalguard generate tests <description> [options]

Options

Option	Description
-n, --count <n>	Number of test cases (default: 10)
--model <model>	LLM model for generation (default: gpt-4o)
--provider <provider>	Provider name (default: openai)
--strategies <list>	Evolution strategies (comma-separated)
--output <file>	Output file path (default: generated-tests.json)

Example

terminal

evalguard generate tests "customer support chatbot" -n 20 --model gpt-4o

evalguard generate assertions

Auto-generate assertions for existing test cases.

Usage

bash

evalguard generate assertions <file> [options]

Options

Option	Description
--model <model>	LLM model for generation
--provider <provider>	Provider name
--output <file>	Output file path

Example

terminal

evalguard generate assertions evals/qa-test.json --output evals/qa-with-assertions.json

evalguard validate

Validate an eval or scan config file. Checks JSON structure, scorer names, plugin names, and strategy names against the registry.

Usage

bash

evalguard validate <file>

Example

terminal

evalguard validate evals/my-eval.json

evalguard compare

Compare two evaluation result files side-by-side. Shows score differences, regressions, and improvements.

Usage

bash

evalguard compare <file1> <file2> [options]

Options

Option	Description
--threshold <n>	Minimum score improvement to highlight (default: 0.05)

Example

terminal

evalguard compare results/baseline.json results/candidate.json --threshold 0.1

evalguard list

List available components: scorers, plugins, strategies, graders, or providers.

Usage

bash

evalguard list <component> [--json]

Options

Option	Description
--json	Output as JSON

Example

terminal

evalguard list scorers
evalguard list plugins --json
evalguard list providers

evalguard firewall

Test input against LLM firewall rules locally. Supports stdin, file input, and custom rules.

Usage

bash

evalguard firewall <input> [options]

Options

Option	Description
--rules <file>	Custom firewall rules JSON file
--json	Output as JSON

Example

terminal

evalguard firewall "Ignore previous instructions"
evalguard firewall @suspicious-input.txt --rules my-rules.json --json

evalguard watch

Watch an eval config file and re-run the evaluation automatically on every save.

Usage

bash

evalguard watch <file> [options]

Options

Option	Description
--model <model>	Override model
--provider <provider>	Override provider
--debounce <ms>	Debounce interval in ms (default: 1000)

Example

terminal

evalguard watch evals/my-eval.json --model gpt-4o-mini

CI/CD Usage

Use the CLI in your CI/CD pipeline by setting the EVALGUARD_API_KEY environment variable. The CLI reads it automatically.

.github/workflows/eval.yml

name: EvalGuard CI
on: [push]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
      - run: npm install -g @evalguard/cli
      - run: evalguard eval:local evals/regression.json --output json
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      - run: evalguard scan:local scans/security.json --verbose
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}