What's new

Shipping every week.

New features, improvements, and fixes shipped to EvalGuard. The weekly cadence is verified in the git-derived Engineering velocity section below — one row per week, straight from commit history.

View on GitHub

Start free

Curated

Milestone releases

v1.1.0March 2026

NL Pipeline & Adaptive Red Teaming

Two features we haven't found on any platform in our tracked competitor set (as of March 2026). Describe your app in plain English to generate a complete eval suite, and let an AI attacker adapt in real-time to find vulnerabilities static tests miss.

NL→Eval Pipeline

Describe your AI app in natural language. EvalGuard's proprietary pipeline analyzes your app profile, maps domain-specific risks, generates targeted test cases, and assembles a production-ready evaluation config — powered by multi-model orchestration across 90+ providers

Adaptive Multi-Turn Red Teaming

AI-powered attacker that adapts in real-time using UCB1 bandit algorithm. Runs parallel sessions across 100+ strategies × 30 categories, learns from each response, and builds a complete resistance profile

25,000+ test blocks across 2,400+ files

Comprehensive test coverage across all products with end-to-end, integration, and unit tests ensuring production reliability

10 Security Audit Fixes

Hardened authentication, authorization, input validation, and API security based on comprehensive security audit findings

Added

NL→Eval Pipeline — describe your AI app in plain English, get a complete evaluation suite in seconds
Adaptive Multi-Turn Red Teaming with UCB1 bandit optimization and parallel attack sessions
Swagger API Documentation covering the full API surface
Cross-session memory for red teaming attack strategies
Real-time resistance profiling dashboard

Improved

Test suite expanded to 25,000+ describe/it blocks across 2,400+ test files
Red teaming now supports up to 10 conversation turns per session
50+ attack strategies × 30 vulnerability categories coverage
90+ LLM provider support with intelligent orchestration for NL pipeline

Security

10 security audit fixes across authentication and authorization
Hardened input validation on all API endpoints
Improved API key scoping and permission enforcement
Enhanced CSRF and rate limiting protections

v1.0.0March 2026

Launch Release

AI evaluation, security, and governance in one platform. Six products — one provable, audit-ready evidence trail.

200+ Evaluation Scorers

Accuracy, faithfulness, hallucination, bias, toxicity, coherence, and more — the most comprehensive scorer library available

300+ Security Attack Plugins

Prompt injection, jailbreaking, PII extraction, data exfiltration, and 341 more adversarial test types

50+ Attack Strategies

Multi-turn, crescendo, tree-of-attacks, semantic variations, and more sophisticated red teaming strategies

90+ LLM Providers

OpenAI, Anthropic, Google, AWS Bedrock, Azure, Mistral, Cohere, and 84 more — all through a unified API

Added

Eval Engine — Run evaluations with 200+ scorers across accuracy, safety, bias, compliance, and custom metrics
LLM Gateway — Centralized AI traffic management with policy enforcement, rate limiting, and automatic failover
FinOps Dashboard — Real-time cost tracking, budget alerts, and optimization recommendations across all providers
Observability Platform — Production monitoring with real-time dashboards, alerting, and distributed tracing
Prompt IDE — Version-controlled prompt engineering with A/B testing, diff views, and deployment pipelines
Red Teaming-as-a-Service — Automated adversarial testing with 300+ attack plugins and 100+ strategies
EU AI Act auto-risk classification with compliance dashboard and evidence collection
ISO 42001 and SOC 2 readiness tracking with automated controls mapping
Scheduled continuous evaluations with cron-based automation
Enterprise SSO/SAML integration for single sign-on
Feature flags system for gradual rollouts and A/B testing
Webhook delivery with HMAC-SHA256 signing and automatic retries
Full API and SDK support (TypeScript, Python) for CI/CD integration
Self-hostable via Docker Compose and Helm charts
Annotation workflows with human-in-the-loop evaluation queues
Dataset versioning with diff tracking and lineage
Benchmark suites for standardized model comparison
Audit logging for compliance and security monitoring

Security

Row-Level Security (RLS) isolation across all database tables
AES-256 encryption at rest for all sensitive data including API keys
TLS 1.2+ encryption for all data in transit
CSRF protection on all state-changing endpoints
Rate limiting with configurable per-endpoint thresholds
API key scoping with granular permission controls

From git

Engineering velocity

One row per week, auto-generated from git — expand any week for the highlights. 2,722 changes across 19 weeks.

Jul 13 – Jul 19, 202668 features57 fixes+40 other165 changes

featEU AI Act Art. 50(2) OUTPUT-labelling evidence control (Gap-L5)
featgap-L1 regulatory frameworks — Singapore IMDA Agentic (+L1-L4 crosswalk), India RBI/SEBI/IRDAI, Vietnam Law 134/2025, China GB 45438 labeling
fixpin org-scoped API keys to their own org_id (KEY_ORG_MISMATCH)#1082
fixintegrate 4 data-validity lanes — api_key expiry persist, CLI login, Postman, SDK/wrapper/docs drift#1071
fixsub-cap explain input to prevent authenticated event-loop DoS
fixremove dead Canada AIDA report template + correct EU Annex-III date

+ 159 more changes this week

Jul 6 – Jul 12, 2026268 features54 fixes+26 other348 changes

featunified control harmonization — assess once, attest across N (W2-L4)
featgovernance-integrity benchmark harness + adversarial fixtures
featforensic integrity verification over persisted entry sets
featwire 3 shipped-but-dead capabilities to enterprise depth + close DCR SSRF
featself-hardening firewall foundation — learned-attack corpus + canary exports
fixcorrect scan-verdict grader logic + regen openapi + SSO-error test

+ 342 more changes this week

Jun 29 – Jul 5, 202633 features53 fixes+29 other115 changes

featadd OMB Federal AI Memoranda + NAIC AI Bulletin frameworks#963
featAutopatch deploy endpoint — persist firewall-verified guardrails#953
featclose the auto-guardrail Autopatch loop — deploy bridge + verification#947
featEd25519-sign evidence bundles + verify issuance (deepen "signed compliance evidence")#908
fixdeliverWebhook → service-role client, then revoke anon on increment_webhook_failures#932
fixclaim online-eval samplers atomically to stop cross-pod double-billing#923

+ 109 more changes this week

Jun 22 – Jun 28, 202682 features30 fixes3 perf+5 other120 changes

featcontinuous red-team — auto-trigger regression reruns on change events#888
featB7 per-principal behavioral-anomaly detection (compromised-key drift)#853
featWebAuthn / FIDO2 passkeys — enroll + assertion ceremony (A4)#825
featLDAP/Active Directory authentication (governance-compliance-3)#819
featBYOK Phase 1 — per-org envelope encryption + crypto-shred#817
featSDK trace-span signing — verify Ed25519 signatures at ingest (governance-compliance-2)#818

+ 114 more changes this week

Jun 15 – Jun 21, 202655 features38 fixes1 perf+21 other115 changes

featcross-session attacker reputation inline gate (RB-2/P1-2)#739
fixrevoke EXECUTE on server-only SECURITY DEFINER functions#732
fixpin function search_path + flip service-map views to security_invoker#727
fixlock chat_messages + legacy benchmarks to service-role-only#726
fixconvert bug-bounty page to a recognition-only VDP (no cash promised)#669
fixenforce account deactivation per-request (banned users kept live sessions)#660

+ 109 more changes this week

Jun 8 – Jun 14, 202628 features51 fixes3 perf+20 other102 changes

fixvscode SecretStorage + correct API base, refresh model pricing, hygiene#614
fixSSRF redirect re-validation, canary CSPRNG, firewall+queue DoS caps, signup-trigger migration#613
fixeliminate React #418 hydration mismatch on 3 compliance pages#611
fixresolve default signing key LAZILY so the warning stops firing on every SDK/CLI import#564

+ 98 more changes this week

Jun 1 – Jun 7, 202655 features26 fixes+9 other90 changes

featper-org jwt_trusted_issuers DB config + CRUD (deep-read gap #5 Stage 2c)#489
featinbound trusted-issuer JWT on the main API via the membership bridge (deep-read gap #5, Stage 2b)#488
featwire inbound trusted-issuer JWT into OTLP ingest auth (deep-read gap #5, Stage 2a)#483
feathardened inbound JWT/OIDC verifier (deep-read gap #5, Stage 1)#480
featPOST /api/v1/security/red-team-plan — capability-driven attack planner as a live API (beyond parity)#473
featagent red-team planner — capability-driven attack selection (MLflow auto-tester parity)#468

+ 84 more changes this week

May 25 – May 31, 2026138 features62 fixes1 perf+44 other245 changes

feat1-μ — tenant evidence-bundle aggregator (FedRAMP / HIPAA / SOC 2 + every framework catalog)
feat0-E — MCP tool-call firewall
featT5 — OWASP Agentic AI Top-10 severity-density scoring
fixAUTH-EXEMPT comments on 4 round-6 routes
fixreposition CROSS-TENANT-EXEMPT comments on T7/T3 routes
fixdrop judge-bias dup + document prompt_tags/labels coexistence

+ 239 more changes this week

May 18 – May 24, 2026129 features65 fixes+58 other252 changes

featextend pillar mapping to 168 scorers + draft DPDP legal-vet brief
featPhase E — P0/P1/P2 hardening (e2e creds + CLI dry-run + lib hardening)#358
featPhase D — depth fixes (policy keys, worker race, Terraform/Java/Semantic, trace-id)#357
featPhase C — distribution wedge (GitHub App + CLI + pricing DB + scan-model)#356
featPhase B — namesake clearance, 8 stub→real conversions#355
featPhase A — P0 cross-tenant, SSRF, postgrest-safe, idempotency hardening#354

+ 246 more changes this week

May 11 – May 17, 202653 features38 fixes+32 other123 changes

fixrestore a11y attributes lost in PR #283 auth-page refresh#288

+ 122 more changes this week

May 4 – May 10, 202645 features54 fixes2 perf+115 other216 changes

featOWASP Agentic AI Top 10 (2025) framework#140
featSBOM workflow + security.txt + RFC 9116 ratchet
fixprovider-keys GET defense-in-depth field whitelist
docsstarter pack — vendor comparison + control map + gap list

+ 212 more changes this week

Apr 27 – May 3, 202645 features58 fixes1 security+336 other440 changes

featscoreboard view across all 33 frameworks (TrojAI parity)

+ 439 more changes this week

View full history on GitHub →

Stay in the loop

LinkedIn Join Discord