Full Platform

Everything You Need to Ship Safe AI

Security testing, evaluation, compliance, cost tracking, and observability in one platform. No duct tape required.

249
Attack Plugins
166
Scorers
87
LLM Providers
43
Strategies
13
Benchmarks
NL→Eval Pipeline
Adaptive Red Team
166
Scorers Available

NL→Eval Pipeline

Describe your AI application in natural language. Our proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by EvalGuard's multi-model orchestration engine.

  • One sentence in, full eval suite out
  • Supports all domains (healthcare, finance, legal, etc.)
  • Multi-model orchestration across 87 LLM providers
  • Intelligent risk mapping with compliance detection
15
Max Turns

Adaptive Red Teaming

An AI attacker probes your LLM across multiple conversation turns, adapts its strategy based on resistance, and finds vulnerabilities that static tests miss. Powered by UCB1 bandit optimization.

  • Multi-turn conversations (up to 15 turns)
  • Parallel attack sessions with cross-session memory
  • 43 strategies × 14 categories
  • Real-time resistance profiling
249
Attack Plugins

Security Testing

The industry's most comprehensive red teaming suite. Test your AI systems against prompt injection, jailbreaks, data exfiltration, and 228 more attack vectors across 43 unique strategies.

  • 43 attack strategies
  • Prompt injection detection
  • Jailbreak resistance testing
  • PII and data exfiltration scans
166
Scorers

Evaluation Engine

166 built-in scorers covering faithfulness, relevance, toxicity, hallucination, coherence, and more. Create custom assertions with LLM-as-judge rubrics or deterministic rules.

  • 23 benchmark suites
  • Custom LLM-as-judge assertions
  • Side-by-side model comparison
  • Regression tracking over time
87
Providers

87 LLM Providers + 121-model catalog

Connect to OpenAI, Anthropic, Google Gemini, Meta Llama, Mistral, Cohere, and 80 more providers through a single unified interface. Switch models with one line of config.

  • OpenAI, Anthropic, Gemini
  • Open-source models (Llama, Mistral)
  • Custom endpoint support
  • Unified provider interface
$0
Surprise Bills

Cost Intelligence

Track spend per model, per prompt, and per team. Set budget alerts, forecast costs, and identify optimization opportunities before they become invoice surprises.

  • Per-model cost tracking
  • Team budget controls
  • Cost-per-evaluation metrics
  • Spend alerts and forecasting
7
Frameworks

Compliance

Map your AI systems to OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, EU AI Act, ISO 42001, India DPDP, and HIPAA. Generate audit-ready reports, track compliance drift, and maintain documentation automatically.

  • OWASP LLM Top 10 coverage
  • EU AI Act & ISO 42001 compliance
  • NIST AI RMF & MITRE ATLAS alignment
  • Audit-ready report generation
2.57ms
p95 (measured)

LLM Firewall

Visual rule builder for real-time prompt and response filtering. Semantic matching, regex patterns, and PII detection. Published p95: 2.57ms across 20K runs (see /trust/latency).

  • Visual rule builder
  • Semantic matching engine
  • PII detection and redaction
  • Edge-deployed for low latency
100%
Version History

Prompt Registry

Version control your prompts like code. View diffs, roll back to any version, require team approvals, and deploy to production with confidence.

  • Full version control
  • Side-by-side diff view
  • One-click rollback
  • Team approval workflows
33
Frameworks

AI Compliance Engine

Automated compliance assessment across 33 regulatory frameworks. EU AI Act risk classification, gap analysis, remediation plans, and audit-ready documentation — all generated automatically.

  • EU AI Act, GDPR, HIPAA, SOC 2, ISO 42001, NIST AI RMF
  • India DPDP, Colorado AI Act, NYC LL144, BIPA
  • Auto risk classification + PDF export
  • Regulatory change tracking with alerts
1
Single Pane of Glass

Executive Risk Dashboard

Board-level view of your entire AI governance posture. Track compliance scores, active incidents, vendor risks, and cost analytics in a single CISO-ready dashboard.

  • Risk scoring across all AI systems
  • Compliance framework status overview
  • Active incident tracking with MTTR
  • Vendor risk + DPA status at a glance
11
RCA Templates

Incident Management & RCA

When AI goes wrong, EvalGuard provides structured incident response with root cause analysis templates for 11 AI failure categories — from hallucinations to compliance violations.

  • Auto severity classification
  • Investigation guides per incident type
  • Remediation plan tracking
  • Regulatory notification workflows
4
Pre-built Profiles

Vendor Risk Management

Track third-party AI vendor risks, DPA status, lock-in scores, and policy changes. Pre-built profiles for OpenAI, Anthropic, Google, and Mistral.

  • 6-dimension risk assessment per vendor
  • DPA tracking with expiry alerts
  • Vendor lock-in scoring (0-100)
  • Policy change detection + notifications
5
SLA Metrics

SLA Monitoring

Define and enforce SLA targets per model and endpoint. Track availability, latency percentiles, error rates, and throughput with automatic violation alerts.

  • Availability, latency P95/P99, error rate, throughput
  • Per-model SLA tracking
  • Compliance reporting with sub-window analysis
  • Automatic alerts on SLA violations
Auto
Re-evaluation

Auto Re-evaluation on Drift

When model behavior drifts, EvalGuard automatically triggers re-evaluation. Configurable thresholds, cooldowns, and daily limits prevent alert fatigue.

  • Z-score drift detection triggers eval runs
  • Configurable severity thresholds
  • Cooldown + daily limit protections
  • Full trigger history and stats
3
Model Formats

Model Audit

Scan pickle files, PyTorch checkpoints, and ONNX models for backdoors, trojans, and malicious payloads before deploying them to production.

  • Pickle deserialization scan
  • PyTorch checkpoint analysis
  • ONNX model validation
  • Supply chain threat detection
18
AI Providers Tracked

Shadow AI Detection

Detect unauthorized AI usage across your organization. Monitor which models employees use, flag PII in outbound prompts, and block unapproved providers — all through your AI gateway.

  • PII detection in prompts (SSN, credit cards, emails)
  • Credential leak detection (AWS keys, DB connection strings)
  • Unauthorized model/provider blocking
  • Real-time risk scoring per request
12
Misconfig Checks

AI Security Posture (AI-SPM)

Discover every AI model in your organization, map data flows, detect misconfigurations, and get a unified posture score. Like Palo Alto Prisma — but for AI.

  • Automatic model discovery from gateway logs
  • Data flow mapping with cross-border detection
  • Misconfiguration detection with auto-fix suggestions
  • Overall posture score (0-100) with risk distribution
3
Analysis Modes

Smart AI Copilot

Paste your scan or eval results — the copilot auto-analyzes findings, prioritizes by risk, suggests step-by-step fixes with code examples, and maps compliance impact.

  • Security scan analysis with fix prioritization
  • Eval results analysis with weak area identification
  • GDPR, HIPAA, OWASP compliance impact assessment
  • Estimated fix time and code examples per finding
8
Providers Proxied

AI Gateway

Route all LLM traffic through EvalGuard's gateway for automatic security scanning, cost tracking, rate limiting, and observability. One line base URL change.

  • Input/output firewall with PII + injection detection
  • Per-key rate limiting and cost tracking
  • SSRF protection with DNS rebinding checks
  • Streaming support for all providers
OTLP
Protocol

Agent Observability

Full distributed tracing for AI agents. Visualize multi-step agent trajectories, track token usage per step, analyze RAG pipeline quality, and detect hallucinations.

  • OpenTelemetry-native trace ingestion
  • Agent trajectory visualization (thought/action/observation)
  • RAG diagnostics with chunk attribution
  • Cost tracking per trace with model breakdown
4
Formats

Model-File Scanner

Scan pickle, safetensors, GGUF, and ONNX model files for malicious opcodes, tampered tensors, and smuggled payloads before loading them into production. Pure-TS implementation — never executes attacker code.

  • Pickle opcode analysis (GLOBAL / REDUCE / BUILD detection)
  • Safetensors structural validation (header, tensor bounds)
  • Hugging Face repo download + scan endpoint
  • Direct file upload + SHA256 fingerprint tracking
3
Clouds

AI Bill of Materials (AI-BOM)

Agentless discovery of every AI resource in your cloud — foundation models, custom models, endpoints, Bedrock agents. Risk-scored with deprecated-model flags and network-exposure factors.

  • AWS Bedrock (foundation + custom models + agents)
  • GCP Vertex AI (Gemini, custom, endpoints, reasoning engines)
  • Azure OpenAI (deployments, Cognitive Services accounts)
  • Deprecated-model + public-network-access risk factors
Real-time
Audit

MCP Traffic Inspection

Audit every tool call flowing through MCP. Detect unauthorized tool use, memory poisoning, and policy violations in real time. Risk-scored log of agent behaviour.

  • Per-call: server, tool, agent, status, risk score
  • Filter by status / min risk / tool name
  • RLS-scoped per project, redacted arguments
  • Cisco AI Defense parity for MCP introspection
2.57ms
p95 (full pipeline, 20K runs)

Gateway Latency Benchmarks

Published, reproducible latency numbers for the full firewall pipeline (pattern + token + semantic + output layers) — measured, not claimed.

  • Full pipeline p50 2.11ms · p95 2.57ms · p99 3.19ms across 20K runs
  • Per-layer breakdown: pattern 0.16ms, token 0.03ms, semantic 2.36ms, output 0ms
  • ~498 req/s single-thread sustained, no errors
  • Reproducible: pnpm bench:firewall-latency · published at /trust/latency

Plus Everything Else

OpenTelemetry
BYOK Encryption
Real-time Traces
CI/CD Templates
Self-hosted Docker
Usage Analytics
Python SDK
Swagger API Docs

See it all in action

Start free with 10,000 evaluations per month, or get a personalized demo from our team.

Features | EvalGuard