Full Platform

Everything You Need to Ship Safe AI

Security testing, evaluation, compliance, cost tracking, and observability in one platform. No duct tape required.

249

Attack Plugins

166

Scorers

LLM Providers

Strategies

Benchmarks

✔

NL→Eval Pipeline

✔

Adaptive Red Team

166

Scorers Available

NL→Eval Pipeline

Describe your AI application in natural language. Our proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by EvalGuard's multi-model orchestration engine.

One sentence in, full eval suite out
Supports all domains (healthcare, finance, legal, etc.)
Multi-model orchestration across 87 LLM providers
Intelligent risk mapping with compliance detection

Max Turns

Adaptive Red Teaming

An AI attacker probes your LLM across multiple conversation turns, adapts its strategy based on resistance, and finds vulnerabilities that static tests miss. Powered by UCB1 bandit optimization.

Multi-turn conversations (up to 15 turns)
Parallel attack sessions with cross-session memory
43 strategies × 14 categories
Real-time resistance profiling

249

Attack Plugins

Security Testing

The industry's most comprehensive red teaming suite. Test your AI systems against prompt injection, jailbreaks, data exfiltration, and 228 more attack vectors across 43 unique strategies.

43 attack strategies
Prompt injection detection
Jailbreak resistance testing
PII and data exfiltration scans

166

Scorers

Evaluation Engine

166 built-in scorers covering faithfulness, relevance, toxicity, hallucination, coherence, and more. Create custom assertions with LLM-as-judge rubrics or deterministic rules.

23 benchmark suites
Custom LLM-as-judge assertions
Side-by-side model comparison
Regression tracking over time

Providers

87 LLM Providers + 121-model catalog

Connect to OpenAI, Anthropic, Google Gemini, Meta Llama, Mistral, Cohere, and 80 more providers through a single unified interface. Switch models with one line of config.

OpenAI, Anthropic, Gemini
Open-source models (Llama, Mistral)
Custom endpoint support
Unified provider interface

Surprise Bills

Cost Intelligence

Track spend per model, per prompt, and per team. Set budget alerts, forecast costs, and identify optimization opportunities before they become invoice surprises.

Per-model cost tracking
Team budget controls
Cost-per-evaluation metrics
Spend alerts and forecasting

Frameworks

Compliance

Map your AI systems to OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, EU AI Act, ISO 42001, India DPDP, and HIPAA. Generate audit-ready reports, track compliance drift, and maintain documentation automatically.

OWASP LLM Top 10 coverage
EU AI Act & ISO 42001 compliance
NIST AI RMF & MITRE ATLAS alignment
Audit-ready report generation

2.57ms

p95 (measured)

LLM Firewall

Visual rule builder for real-time prompt and response filtering. Semantic matching, regex patterns, and PII detection. Published p95: 2.57ms across 20K runs (see /trust/latency).

Visual rule builder
Semantic matching engine
PII detection and redaction
Edge-deployed for low latency

100%

Version History

Prompt Registry

Version control your prompts like code. View diffs, roll back to any version, require team approvals, and deploy to production with confidence.

Full version control
Side-by-side diff view
One-click rollback
Team approval workflows

Frameworks

AI Compliance Engine

Automated compliance assessment across 33 regulatory frameworks. EU AI Act risk classification, gap analysis, remediation plans, and audit-ready documentation — all generated automatically.

EU AI Act, GDPR, HIPAA, SOC 2, ISO 42001, NIST AI RMF
India DPDP, Colorado AI Act, NYC LL144, BIPA
Auto risk classification + PDF export
Regulatory change tracking with alerts

Single Pane of Glass

Executive Risk Dashboard

Board-level view of your entire AI governance posture. Track compliance scores, active incidents, vendor risks, and cost analytics in a single CISO-ready dashboard.

Risk scoring across all AI systems
Compliance framework status overview
Active incident tracking with MTTR
Vendor risk + DPA status at a glance

RCA Templates

Incident Management & RCA

When AI goes wrong, EvalGuard provides structured incident response with root cause analysis templates for 11 AI failure categories — from hallucinations to compliance violations.

Auto severity classification
Investigation guides per incident type
Remediation plan tracking
Regulatory notification workflows

Pre-built Profiles

Vendor Risk Management

Track third-party AI vendor risks, DPA status, lock-in scores, and policy changes. Pre-built profiles for OpenAI, Anthropic, Google, and Mistral.

6-dimension risk assessment per vendor
DPA tracking with expiry alerts
Vendor lock-in scoring (0-100)
Policy change detection + notifications

SLA Metrics

SLA Monitoring

Define and enforce SLA targets per model and endpoint. Track availability, latency percentiles, error rates, and throughput with automatic violation alerts.

Availability, latency P95/P99, error rate, throughput
Per-model SLA tracking
Compliance reporting with sub-window analysis
Automatic alerts on SLA violations

Auto

Re-evaluation

Auto Re-evaluation on Drift

When model behavior drifts, EvalGuard automatically triggers re-evaluation. Configurable thresholds, cooldowns, and daily limits prevent alert fatigue.

Z-score drift detection triggers eval runs
Configurable severity thresholds
Cooldown + daily limit protections
Full trigger history and stats

Model Formats

Model Audit

Scan pickle files, PyTorch checkpoints, and ONNX models for backdoors, trojans, and malicious payloads before deploying them to production.

Pickle deserialization scan
PyTorch checkpoint analysis
ONNX model validation
Supply chain threat detection

AI Providers Tracked

Shadow AI Detection

Detect unauthorized AI usage across your organization. Monitor which models employees use, flag PII in outbound prompts, and block unapproved providers — all through your AI gateway.

PII detection in prompts (SSN, credit cards, emails)
Credential leak detection (AWS keys, DB connection strings)
Unauthorized model/provider blocking
Real-time risk scoring per request

Misconfig Checks

AI Security Posture (AI-SPM)

Discover every AI model in your organization, map data flows, detect misconfigurations, and get a unified posture score. Like Palo Alto Prisma — but for AI.

Automatic model discovery from gateway logs
Data flow mapping with cross-border detection
Misconfiguration detection with auto-fix suggestions
Overall posture score (0-100) with risk distribution

Analysis Modes

Smart AI Copilot

Paste your scan or eval results — the copilot auto-analyzes findings, prioritizes by risk, suggests step-by-step fixes with code examples, and maps compliance impact.

Security scan analysis with fix prioritization
Eval results analysis with weak area identification
GDPR, HIPAA, OWASP compliance impact assessment
Estimated fix time and code examples per finding

Providers Proxied

AI Gateway

Route all LLM traffic through EvalGuard's gateway for automatic security scanning, cost tracking, rate limiting, and observability. One line base URL change.

Input/output firewall with PII + injection detection
Per-key rate limiting and cost tracking
SSRF protection with DNS rebinding checks
Streaming support for all providers

OTLP

Protocol

Agent Observability

Full distributed tracing for AI agents. Visualize multi-step agent trajectories, track token usage per step, analyze RAG pipeline quality, and detect hallucinations.

OpenTelemetry-native trace ingestion
Agent trajectory visualization (thought/action/observation)
RAG diagnostics with chunk attribution
Cost tracking per trace with model breakdown

Formats

Model-File Scanner

Scan pickle, safetensors, GGUF, and ONNX model files for malicious opcodes, tampered tensors, and smuggled payloads before loading them into production. Pure-TS implementation — never executes attacker code.

Pickle opcode analysis (GLOBAL / REDUCE / BUILD detection)
Safetensors structural validation (header, tensor bounds)
Hugging Face repo download + scan endpoint
Direct file upload + SHA256 fingerprint tracking

Clouds

AI Bill of Materials (AI-BOM)

Agentless discovery of every AI resource in your cloud — foundation models, custom models, endpoints, Bedrock agents. Risk-scored with deprecated-model flags and network-exposure factors.

AWS Bedrock (foundation + custom models + agents)
GCP Vertex AI (Gemini, custom, endpoints, reasoning engines)
Azure OpenAI (deployments, Cognitive Services accounts)
Deprecated-model + public-network-access risk factors

Real-time

Audit

MCP Traffic Inspection

Audit every tool call flowing through MCP. Detect unauthorized tool use, memory poisoning, and policy violations in real time. Risk-scored log of agent behaviour.

Per-call: server, tool, agent, status, risk score
Filter by status / min risk / tool name
RLS-scoped per project, redacted arguments
Cisco AI Defense parity for MCP introspection

2.57ms

p95 (full pipeline, 20K runs)

Gateway Latency Benchmarks

Published, reproducible latency numbers for the full firewall pipeline (pattern + token + semantic + output layers) — measured, not claimed.

Full pipeline p50 2.11ms · p95 2.57ms · p99 3.19ms across 20K runs
Per-layer breakdown: pattern 0.16ms, token 0.03ms, semantic 2.36ms, output 0ms
~498 req/s single-thread sustained, no errors
Reproducible: pnpm bench:firewall-latency · published at /trust/latency

Plus Everything Else

OpenTelemetry

BYOK Encryption

Real-time Traces

CI/CD Templates

Self-hosted Docker

Usage Analytics

Python SDK

Swagger API Docs

See it all in action

Start free with 10,000 evaluations per month, or get a personalized demo from our team.

Start Free Compare to Competitors