Reference

Documentation

Everything you need to evaluate, secure, and monitor your AI applications with EvalGuard.

Get started

Install, migrate, and run your first eval.

Getting Started

Install the SDK, run your first eval in under 5 minutes.

Migrating from Promptfoo

Drop-in replacement guide — keep your YAML, gain a dashboard, security, and team features.

Migrating from Helicone

Swap the base URL to EvalGuard's gateway proxy — keep logging, cost tracking, and caching, plus firewall, eval, and red team.

Migrating from Humanloop

Export your Humanloop project, then convert it with `evalguard import:humanloop` — prompts, datasets, and evaluators in one command.

Migrating from LangSmith

Import LangSmith run traces and eval datasets with the CLI — then add red team, guardrails, and a SOC 2 evidence engine on the same data.

Migrating from Braintrust

Import Braintrust log spans and eval datasets with the CLI — then add red team, guardrails, and a SOC 2 evidence engine on the same data.

Migrating from DeepEval

Import DeepEval evaluate() results and golden datasets with the CLI — keep your GEval-style metrics, then add hosted red team, guardrails, and a SOC 2 evidence engine on the same data.

Import Ragas evaluate() results and evaluation datasets with the CLI — keep your RAG metrics (faithfulness, answer relevancy, context recall), then add hosted red team, guardrails, and a SOC 2 evidence engine on the same data.

Migrating from Giskard

Import Giskard SuiteResult / vulnerability-scan results and Scenario test-sets with the CLI — keep your conformity and semantic checks, then add a hosted red-team platform, a runtime firewall, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Arize Phoenix

Import Arize Phoenix OpenInference spans with the CLI — model, provider, token counts, and input/output all carry over — then add hosted red team, a runtime firewall, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from MLflow

Import MLflow Trace JSON (2.x + 3.x) with the CLI — spans, model, provider, token usage, and inputs/outputs all carry over — then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from W&B Weave

Import Weights & Biases Weave calls with the CLI — op names, token usage, per-model cost, and exceptions all carry over — then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from TruLens

Import TruLens records + feedback scores with the CLI — inputs, outputs, token cost, and every feedback function score carry over — then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Comet Opik

Import Comet Opik traces + spans with the CLI — model, provider, token usage, cost, and feedback scores all carry over — then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Langfuse

Import Langfuse traces + observations with the CLI, then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Portkey

Import Portkey logs with the CLI, then add hosted red team, guardrails, a governed AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from HuggingFace

Import HuggingFace Datasets / Inference logs with the CLI, then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Vellum

Import Vellum executions with the CLI, then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Athina

Import Athina inference logs with the CLI, then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

Migrating from Maxim

Import Maxim logs with the CLI, then add hosted red team, guardrails, an AI gateway, and a SOC 2 evidence engine on the same data.

API & SDKs

Reference for the REST API, CLI, our SDKs, and the Terraform provider.

API Reference

685 REST endpoints for evals, security, gateway, traces, datasets, and more.

CLI Reference

87 commands: eval, scan, firewall, gateway-config, model-scan, shadow-ai, siem, debug, BYOK, budget, and more.

TypeScript SDK

257+ methods covering evals, security, traces, gateway, cost, compliance, and more.

Python SDK

180+ methods covering evals, scans, traces, OTLP, Shadow AI, AI-SPM.

Go SDK

146+ methods for Go backends — evals, security, gateway, monitoring, compliance.

Java SDK

Typed JVM client + LangChain4j content filter — live on Maven Central.

Terraform provider

Manage projects, API keys, firewall rules, eval schedules, gateway policies, guardrails & agent-memory governance as code (v1.4.0).

gRPC Gateway

Connect-RPC service over HTTP/1.1 + JSON. Same authoritative gateway logic with a strongly-typed contract.

Catalogs

What ships in the box — scorers, attack plugins, and provider adapters.

Scorers

200+ built-in scorers across 12 categories: text matching, semantic, LLM-based, JSON & structured, NLP metrics, MCP & agentic, conversation, RAG, safety, multimodal, performance, custom.

Attack Plugins

300+ red-team plugins across 100+ adversarial strategies.

Attack Strategies

100+ adversarial transformations — encodings, obfuscation, multi-turn and agentic attacks applied on top of the plugin corpus.

Providers

77+ LLM providers including OpenAI, Anthropic, Gemini, Bedrock, Azure, Vertex, Groq, and more.

Content Moderation

Image, video, and deepfake moderation over a bring-your-own-model engine — fail-closed, with TypeScript/Python SDK + CLI.

Dataset Versioning

Immutable per-dataset snapshots. Pin experiments to a frozen version for bit-perfect reproducible re-runs.

Fine-Tuning

Cross-provider fine-tuning ledger. Pin training to immutable dataset snapshots; track jobs across OpenAI, Anthropic, Vertex, and more.

Integrations & Ops

Wire EvalGuard into your stack and deploy on your terms.

Governance

Mapping findings to compliance frameworks.

Compliance

50 frameworks — OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, EU AI Act, ISO 42001, SOC 2, GDPR, HIPAA, India DPDP, and more.

Quick Start

Get running with EvalGuard in three commands.

npm install -g @evalguard/cli
evalguard login --key eg_your_api_key
evalguard eval my-eval.json

Full getting started guide