300+ plugins · 50+ strategies · adaptive engine

OWASP LLM Top 10 · MITRE ATLAS · NIST AI RMF

AI safety + red-team for security teams

The eval + guardrail + red-team + audit platform built by + for security people. 300+ attack plugins, 50+ strategies, adaptive multi-turn engine — the deepest red-team coverage of any LLM-safety platform, period. Plus the audit trail your SOC + your customer's procurement security review need.

Run a free red-team View attack plugin catalog Book a demo

235

Scorers

LLM providers

334

Red-team plugins

2.57ms

Firewall p95

Glowing teal circuit-board schematic on a dark background — CybersecurityOWASP LLM Top 10 · MITRE ATLAS · NIST AI RMF

What ships today

Honest posture, not roadmap promises

Every checked item is in production today. In-progress items are flagged explicitly — no overclaiming, no vapor.

OWASP LLM Top 10 — all 10 controls mapped to plugins + scorers

OWASP Agentic AI Top 10 (2025) — all AAI01-AAI10 controls mapped

MITRE ATLAS adversarial-AI coverage — 7 tactics, 12 mapped techniques (AML.T00xx)

FedRAMP AI control mappings (NIST SP 800-53-derived: AC / AU / IR / RA / SC / SR families)

Adaptive multi-turn red-team engine (UCB1 bandit per strategy)

Eg-safety-bench-1k publishable safety benchmark bundle (HuggingFace-ready)

Tamper-evident audit log (integrity hash chain + advisory locks)

Built for buyer reality

Cybersecurity AI use cases we ship for

Pre-prod LLM application security testing

Security team needs to test every customer-facing LLM app for jailbreak / PII-extraction / prompt-injection BEFORE prod release. Manual red-teaming doesn't scale; off-the-shelf tools test 30 attacks.

EvalGuard features

300+ attack plugins across 50+ strategies (far deeper red-team coverage than off-the-shelf open-source tools)
Adaptive engine uses UCB1 to focus on attacks that ACTUALLY work for the target
OWASP LLM Top 10 + Agentic AI Top 10 + MITRE ATLAS coverage per scan
Per-finding evidence: request, response, attack vector, suggested fix

Continuous red-team in production

SOC team wants a continuous attack feed against the production LLM endpoint. Need synthetic-traffic that doesn't pollute real customer data + alerting that distinguishes real attacks from synthetic.

EvalGuard features

Synthetic-traffic mode: scans tagged + filtered out of real-customer cost/audit ledgers
Wilson-interval early-stopping on red-team campaigns (don't waste cycles on convergent strategies)
Integration with your SIEM via webhook (Splunk + Datadog + Sentry receivers built-in)
Threat-intel feed for runtime attack-corpus updates (no manual plugin curation)

Customer-facing SOC-2 / vendor-security review

Customer's procurement team asks 'show me your LLM safety posture.' Need to hand them a control-mapping report, not a 30-page narrative.

EvalGuard features

SOC 2 Common Criteria control mappings auto-generated per scan
Customer-shareable evidence bundle (PDF + JSON, no internal IP leakage)
OWASP LLM Top 10 coverage report (every control with attack examples + pass/fail)
Auditor-friendly tamper-evident audit log (every scan finding cryptographically signed)

Internal AI-skill / agent security review

Eng + product teams are shipping Claude / Custom GPTs / agent workflows internally. Security team needs to scan every config-bundle for risky tool grants + prompt-injection surfaces BEFORE they go live.

EvalGuard features

AI-skill-package scanner (Skills, Custom GPTs, agent configs) — catches over-permissive tool grants
Evil-MCP-server target for safe red-team training environments
Repo-scanner detects leaked API keys + insecure prompt patterns in agent code
MCP-gateway runtime tool-pinning prevents agent supply-chain swap attacks

Wire it in 60 seconds

300+ plugins · 50+ strategies · one CLI

The evalguard CLI runs the full red-team locally — no API key required for the offline pack. Point it at your endpoint config + plug into your CI pipeline.

bash

# Install the CLI
npm i -g @evalguard/cli

# Run the offline red-team against an endpoint config
# (OWASP LLM Top 10 + MITRE ATLAS plugins, JSON/YAML config)
evalguard scan:local ./red-team.config.yaml \
  --output ./out/findings.json --verbose

# Scan model artifacts for trojans / backdoors before deploy
evalguard model-scan ./models/finetuned.safetensors \
  --severity high --format json

# Scan a repo for risky agent tool grants + leaked secrets
evalguard repo-scan ./my-agent --format json > scan.json

All three commands ship in @evalguard/cli today — wire into your CI to gate PRs on red-team + model + repo scan results.

Same integration for Anthropic, Gemini, and 91+ providers — swap wrapOpenAI for wrapAnthropic.

Stack

Six surfaces, one platform

Eval, firewall, red-team, audit, BYOK, dashboard — every surface ships out of the box. No bolt-on vendors, no procurement cycle per capability.

Attack plugin library

300+ plugins · 50+ strategies · adaptive UCB1 bandit · far deeper red-team coverage than off-the-shelf open-source tools.

adaptive · multi-turnUCB1 bandit

CC6.1Logical accessevidence

CC7.2Threat detectionevidence

CC8.1Change mgmtevidence

AI-skill + repo scanner

Scan agent configs + Custom GPTs + repos for risky tool grants + leaked secrets.

Compliance dashboard

Per-tenant OWASP / NIST / MITRE ATLAS control coverage with auto-generated reports.

Ship the deepest LLM red-team coverage in the industry.

Free trial includes all 300+ plugins, the adaptive engine, OWASP + NIST control mapping, and the full audit trail. SOC2 + OWASP evidence bundles ready for your next customer security review.

Run a free red-team Talk to founders

Apache-2.0 source · SOC 2 evidence engine live · full trust center

AI safety + red-team for security teams

Honest posture, not roadmap promises

Cybersecurity AI use cases we ship for

Pre-prod LLM application security testing

EvalGuard features

Continuous red-team in production

EvalGuard features

Customer-facing SOC-2 / vendor-security review

EvalGuard features

Internal AI-skill / agent security review

EvalGuard features

300+ plugins · 50+ strategies · one CLI

Six surfaces, one platform

Attack plugin library

Firewall (defense)

Scorers + benchmarks

Audit log + evidence

AI-skill + repo scanner

Compliance dashboard

Ship the deepest LLM red-team coverage in the industry.