OWASP LLM Top 10 · MITRE ATLAS · NIST 800-53

AI safety + red-team for security teams

The eval + guardrail + red-team + audit platform built by + for security people. 249 attack plugins, 42 strategies, adaptive multi-turn engine — the deepest red-team coverage of any LLM-safety platform, period. Plus the audit trail your SOC + your customer's procurement security review need.

0
Scorers
0
LLM providers
0
Red-team plugins
0.00ms
Firewall p95

What ships today

Honest posture, not roadmap promises

Every checked item is in production today. In-progress items are flagged explicitly — no overclaiming, no vapor.

OWASP LLM Top 10 — all 10 controls mapped to plugins + scorers
OWASP Agentic AI Top 10 (2025) — all AAI01-AAI10 controls mapped
MITRE ATLAS adversarial-AI techniques covered (T1-T15)
NIST SP 800-53 AI control mappings (SA / SI / SC families)
Adaptive multi-turn red-team engine (UCB1 bandit per strategy)
Eg-safety-bench-1k published dataset on HuggingFace
Tamper-evident audit log (integrity hash chain + advisory locks)

Built for buyer reality

Cybersecurity AI use cases we ship for

Pre-prod LLM application security testing

Security team needs to test every customer-facing LLM app for jailbreak / PII-extraction / prompt-injection BEFORE prod release. Manual red-teaming doesn't scale; off-the-shelf tools test 30 attacks.

EvalGuard features

  • 249 attack plugins across 42 strategies (5× the nearest open-source competitor)
  • Adaptive engine uses UCB1 to focus on attacks that ACTUALLY work for the target
  • OWASP LLM Top 10 + Agentic AI Top 10 + MITRE ATLAS coverage per scan
  • Per-finding evidence: request, response, attack vector, suggested fix

Continuous red-team in production

SOC team wants a continuous attack feed against the production LLM endpoint. Need synthetic-traffic that doesn't pollute real customer data + alerting that distinguishes real attacks from synthetic.

EvalGuard features

  • Synthetic-traffic mode: scans tagged + filtered out of real-customer cost/audit ledgers
  • Wilson-interval early-stopping on red-team campaigns (don't waste cycles on convergent strategies)
  • Integration with your SIEM via webhook (Splunk + Datadog + Sentry receivers built-in)
  • Threat-intel feed for runtime attack-corpus updates (no manual plugin curation)

Customer-facing SOC-2 / vendor-security review

Customer's procurement team asks 'show me your LLM safety posture.' Need to hand them a control-mapping report, not a 30-page narrative.

EvalGuard features

  • SOC 2 Common Criteria control mappings auto-generated per scan
  • Customer-shareable evidence bundle (PDF + JSON, no internal IP leakage)
  • OWASP LLM Top 10 coverage report (every control with attack examples + pass/fail)
  • Auditor-friendly tamper-evident audit log (every scan finding cryptographically signed)

Internal AI-skill / agent security review

Eng + product teams are shipping Claude / Custom GPTs / agent workflows internally. Security team needs to scan every config-bundle for risky tool grants + prompt-injection surfaces BEFORE they go live.

EvalGuard features

  • AI-skill-package scanner (Skills, Custom GPTs, agent configs) — catches over-permissive tool grants
  • Evil-MCP-server target for safe red-team training environments
  • Repo-scanner detects leaked API keys + insecure prompt patterns in agent code
  • MCP-gateway runtime tool-pinning prevents agent supply-chain swap attacks

Wire it in 60 seconds

249 plugins · 42 strategies · one CLI

The evalguard CLI runs the full red-team locally — no API key required for the offline pack. Point it at your endpoint config + plug into your CI pipeline.

bash
# Install the CLI
npm i -g @evalguard/cli

# Run the offline red-team against an endpoint config
# (OWASP LLM Top 10 + MITRE ATLAS plugins, JSON/YAML config)
evalguard scan-local ./red-team.config.yaml \
  --output ./out/findings.json --verbose

# Scan model artifacts for trojans / backdoors before deploy
evalguard model-scan ./models/finetuned.safetensors \
  --severity high --format json

# Scan a repo for risky agent tool grants + leaked secrets
evalguard repo-scan ./my-agent --format json > scan.json
All three commands ship in @evalguard/cli today — wire into your CI to gate PRs on red-team + model + repo scan results.
Same integration for Anthropic, Gemini, and 91+ providers — swap wrapOpenAI for wrapAnthropic.

Ship the deepest LLM red-team coverage in the industry.

Free trial includes all 249 plugins, the adaptive engine, OWASP + NIST control mapping, and the full audit trail. SOC2 + OWASP evidence bundles ready for your next customer security review.

Apache-2.0 source · SOC 2 Type II in progress · full trust center