Everything you need to ship safe AI.
Security testing, evaluation, compliance, cost tracking, and observability — all in one platform. No duct tape required.
Proof, not promises
The widest coverage in the category — measured.
Every number is sourced from the drift-checked registry and the public firewall benchmark, not a slide. Hover any bar to read the count.
Eval scorers vs the field
Built-in, production-ready scorers — no custom authoring required.
built-in scorers
Red-team plugins vs the field
Attack coverage across 40+ strategies and 30 categories.
attack plugins
Firewall p95 stays flat under load
Full pipeline (pattern · token · semantic · output), measured across 20K runs.
Platform breadth
One platform across the full lifecycle — where point tools cover a slice.
NL→Eval Pipeline
Describe your AI application in natural language. Our proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by EvalGuard's multi-model orchestration engine.
- One sentence in, full eval suite out
- Supports all domains (healthcare, finance, legal, etc.)
- Multi-model orchestration across 90+ providers (77 LLM)
- Intelligent risk mapping with compliance detection
Adaptive Red Teaming
An AI attacker probes your LLM across multiple conversation turns, adapts its strategy based on resistance, and finds vulnerabilities that static tests miss. Powered by UCB1 bandit optimization.
- Multi-turn conversations (up to 15 turns)
- Parallel attack sessions with cross-session memory
- 40+ strategies × 30 categories
- Real-time resistance profiling
Security Testing
The industry's most comprehensive red teaming suite. Test your AI systems against prompt injection, jailbreaks, data exfiltration, and 254 more attack vectors across 40+ unique strategies.
- 40+ attack strategies
- Prompt injection detection
- Jailbreak resistance testing
- PII and data exfiltration scans
Evaluation Engine
200+ built-in scorers covering faithfulness, relevance, toxicity, hallucination, coherence, and more. Create custom assertions with LLM-as-judge rubrics or deterministic rules.
- 28 benchmark suites
- Custom LLM-as-judge assertions
- Side-by-side model comparison
- Regression tracking over time
90+ providers + 121-model catalog
Connect to OpenAI, Anthropic, Google Gemini, Meta Llama, Mistral, Cohere, and 85 more providers through a single unified interface. Switch models with one line of config.
- OpenAI, Anthropic, Gemini
- Open-source models (Llama, Mistral)
- Custom endpoint support
- Unified provider interface
Cost Intelligence
Track spend per model, per prompt, and per team. Set budget alerts, forecast costs, and identify optimization opportunities before they become invoice surprises.
- Per-model cost tracking
- Team budget controls
- Cost-per-evaluation metrics
- Spend alerts and forecasting
Compliance
Map your AI systems to OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, EU AI Act, ISO 42001, India DPDP, HIPAA and more. Generate audit-ready reports, track compliance drift, and maintain documentation automatically.
- OWASP LLM Top 10 coverage
- EU AI Act & ISO 42001 compliance
- NIST AI RMF & MITRE ATLAS alignment
- Audit-ready report generation
LLM Firewall
Visual rule builder for real-time prompt and response filtering. Semantic matching, regex patterns, and PII detection. Published p95: 2.57ms across 20K runs (see /trust/latency).
- Visual rule builder
- Semantic matching engine
- PII detection and redaction
- Edge-deployed for low latency
Prompt Registry
Version control your prompts like code. View diffs, roll back to any version, require team approvals, and deploy to production with full audit history.
- Full version control
- Side-by-side diff view
- One-click rollback
- Team approval workflows
AI Compliance Engine
Automated compliance assessment across 33 regulatory frameworks. EU AI Act risk classification, gap analysis, remediation plans, and audit-ready documentation — all generated automatically.
- EU AI Act, GDPR, HIPAA, SOC 2, ISO 42001, NIST AI RMF
- India DPDP, Colorado AI Act, NYC LL144, BIPA
- Auto risk classification + PDF export
- Regulatory change tracking with alerts
Executive Risk Dashboard
Board-level view of your entire AI governance posture. Track compliance scores, active incidents, vendor risks, and cost analytics in a single CISO-ready dashboard.
- Risk scoring across all AI systems
- Compliance framework status overview
- Active incident tracking with MTTR
- Vendor risk + DPA status at a glance
Incident Management & RCA
When AI goes wrong, EvalGuard provides structured incident response with root cause analysis templates for 11 AI failure categories — from hallucinations to compliance violations.
- Auto severity classification
- Investigation guides per incident type
- Remediation plan tracking
- Regulatory notification workflows
Vendor Risk Management
Track third-party AI vendor risks, DPA status, lock-in scores, and policy changes. Pre-built profiles for OpenAI, Anthropic, Google, and Mistral.
- 6-dimension risk assessment per vendor
- DPA tracking with expiry alerts
- Vendor lock-in scoring (0-100)
- Policy change detection + notifications
SLA Monitoring
Define and enforce SLA targets per model and endpoint. Track availability, latency percentiles, error rates, and throughput with automatic violation alerts.
- Availability, latency P95/P99, error rate, throughput
- Per-model SLA tracking
- Compliance reporting with sub-window analysis
- Automatic alerts on SLA violations
Auto Re-evaluation on Drift
When model behavior drifts, EvalGuard automatically triggers re-evaluation. Configurable thresholds, cooldowns, and daily limits prevent alert fatigue.
- Z-score drift detection triggers eval runs
- Configurable severity thresholds
- Cooldown + daily limit protections
- Full trigger history and stats
Model Audit
Scan pickle files, PyTorch checkpoints, and ONNX models for backdoors, trojans, and malicious payloads before deploying them to production.
- Pickle deserialization scan
- PyTorch checkpoint analysis
- ONNX model validation
- Supply chain threat detection
Shadow AI Detection
Detect unauthorized AI usage across your organization. Monitor which models employees use, flag PII in outbound prompts, and block unapproved providers — all through your AI gateway.
- PII detection in prompts (SSN, credit cards, emails)
- Credential leak detection (AWS keys, DB connection strings)
- Unauthorized model/provider blocking
- Real-time risk scoring per request
AI Security Posture (AI-SPM)
Discover every AI model in your organization, map data flows, detect misconfigurations, and get a unified posture score. Like Palo Alto Prisma — but for AI.
- Automatic model discovery from gateway logs
- Data flow mapping with cross-border detection
- Misconfiguration detection with auto-fix suggestions
- Overall posture score (0-100) with risk distribution
Smart AI Copilot
Paste your scan or eval results — the copilot auto-analyzes findings, prioritizes by risk, suggests step-by-step fixes with code examples, and maps compliance impact.
- Security scan analysis with fix prioritization
- Eval results analysis with weak area identification
- GDPR, HIPAA, OWASP compliance impact assessment
- Estimated fix time and code examples per finding
AI Gateway
Route all LLM traffic through EvalGuard's gateway for automatic security scanning, cost tracking, rate limiting, and observability. One line base URL change.
- Input/output firewall with PII + injection detection
- Per-key rate limiting and cost tracking
- SSRF protection with DNS rebinding checks
- Streaming support for all providers
Agent Observability
Full distributed tracing for AI agents. Visualize multi-step agent trajectories, track token usage per step, analyze RAG pipeline quality, and detect hallucinations.
- OpenTelemetry-native trace ingestion
- Agent trajectory visualization (thought/action/observation)
- RAG diagnostics with chunk attribution
- Cost tracking per trace with model breakdown
Model-File Scanner
Scan pickle, safetensors, GGUF, and ONNX model files for malicious opcodes, tampered tensors, and smuggled payloads before loading them into production. Pure-TS implementation — never executes attacker code.
- Pickle opcode analysis (GLOBAL / REDUCE / BUILD detection)
- Safetensors structural validation (header, tensor bounds)
- Hugging Face repo download + scan endpoint
- Direct file upload + SHA256 fingerprint tracking
AI Bill of Materials (AI-BOM)
Agentless discovery of every AI resource in your cloud — foundation models, custom models, endpoints, Bedrock agents. Risk-scored with deprecated-model flags and network-exposure factors.
- AWS Bedrock (foundation + custom models + agents)
- GCP Vertex AI (Gemini, custom, endpoints, reasoning engines)
- Azure OpenAI (deployments, Cognitive Services accounts)
- Deprecated-model + public-network-access risk factors
MCP Traffic Inspection
Audit every tool call flowing through MCP. Detect unauthorized tool use, memory poisoning, and policy violations in real time. Risk-scored log of agent behaviour.
- Per-call: server, tool, agent, status, risk score
- Filter by status / min risk / tool name
- RLS-scoped per project, redacted arguments
- Cisco AI Defense parity for MCP introspection
Gateway Latency Benchmarks
Published, reproducible latency numbers for the full firewall pipeline (pattern + token + semantic + output layers) — measured, not claimed.
- Full pipeline p50 2.11ms · p95 2.57ms · p99 3.19ms across 20K runs
- Per-layer breakdown: pattern 0.16ms, token 0.03ms, semantic 2.36ms, output 0ms
- ~498 req/s single-thread sustained, no errors
- Reproducible: pnpm bench:firewall-latency · published at /trust/latency
Plus Everything Else
See it all in action
Start free with 50,000 traces per month, or get a personalized demo from our team.