Changelog

New features, improvements, and fixes shipped to EvalGuard

v1.1.0March 2026

NL Pipeline & Adaptive Red Teaming

Two industry-first features that no competitor has. Describe your app in plain English to generate a complete eval suite, and let an AI attacker adapt in real-time to find vulnerabilities static tests miss.

NL→Eval Pipeline

Describe your AI app in natural language. EvalGuard's proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by multi-model orchestration across 87 providers

Adaptive Multi-Turn Red Teaming

AI-powered attacker that adapts in real-time using UCB1 bandit algorithm. Runs parallel sessions across 43 strategies × 14 categories, learns from each response, and builds a complete resistance profile

25,000+ test blocks across 457 files

Comprehensive test coverage across all 6 products with end-to-end, integration, and unit tests ensuring production reliability

10 Security Audit Fixes

Hardened authentication, authorization, input validation, and API security based on comprehensive security audit findings

Added
  • NL→Eval Pipeline — describe your AI app in plain English, get a complete evaluation suite in seconds
  • Adaptive Multi-Turn Red Teaming with UCB1 bandit optimization and parallel attack sessions
  • Swagger API Documentation covering all 307 API endpoints
  • Cross-session memory for red teaming attack strategies
  • Real-time resistance profiling dashboard
Improved
  • Test suite expanded to 25,000+ describe/it blocks across 457 test files
  • Red teaming now supports up to 15 conversation turns per session
  • 43 attack strategies × 14 vulnerability categories coverage
  • 87 LLM provider support with intelligent orchestration for NL pipeline
Security
  • 10 security audit fixes across authentication and authorization
  • Hardened input validation on all API endpoints
  • Improved API key scoping and permission enforcement
  • Enhanced CSRF and rate limiting protections
v1.0.0March 2026

Launch Release

The most comprehensive AI evaluation, security, and governance platform. Six products, one platform, zero blind spots.

138 Evaluation Scorers

Accuracy, faithfulness, hallucination, bias, toxicity, coherence, and more — the most comprehensive scorer library available

249 Security Attack Plugins

Prompt injection, jailbreaking, PII extraction, data exfiltration, and 245 more adversarial test types

43 Attack Strategies

Multi-turn, crescendo, tree-of-attacks, semantic variations, and more sophisticated red teaming strategies

87 LLM Providers

OpenAI, Anthropic, Google, AWS Bedrock, Azure, Mistral, Cohere, and 80 more — all through a unified API

Added
  • Eval Engine — Run evaluations with 166 scorers across accuracy, safety, bias, compliance, and custom metrics
  • LLM Gateway — Centralized AI traffic management with policy enforcement, rate limiting, and automatic failover
  • FinOps Dashboard — Real-time cost tracking, budget alerts, and optimization recommendations across all providers
  • Observability Platform — Production monitoring with real-time dashboards, alerting, and distributed tracing
  • Prompt IDE — Version-controlled prompt engineering with A/B testing, diff views, and deployment pipelines
  • Red Teaming-as-a-Service — Automated adversarial testing with 249 attack plugins and 43 strategies
  • EU AI Act auto-risk classification with compliance dashboard and evidence collection
  • ISO 42001 and SOC 2 readiness tracking with automated controls mapping
  • Scheduled continuous evaluations with cron-based automation
  • Enterprise SSO/SAML integration for single sign-on
  • Feature flags system for gradual rollouts and A/B testing
  • Webhook delivery with HMAC-SHA256 signing and automatic retries
  • Full API and SDK support (TypeScript, Python) for CI/CD integration
  • Self-hostable via Docker Compose and Helm charts
  • Annotation workflows with human-in-the-loop evaluation queues
  • Dataset versioning with diff tracking and lineage
  • Benchmark suites for standardized model comparison
  • Audit logging for compliance and security monitoring
Security
  • Row-Level Security (RLS) isolation across all database tables
  • AES-256 encryption at rest for all sensitive data including API keys
  • TLS 1.2+ encryption for all data in transit
  • CSRF protection on all state-changing endpoints
  • Rate limiting with configurable per-endpoint thresholds
  • API key scoping with granular permission controls

Stay in the loop

Follow us on LinkedIn, Twitter/X, or join our Discord to get notified about new releases.

Changelog | EvalGuard