Skip to content

Red Team Library

Attack Plugins

250+ red team plugins across 22 categories, plus 40+ encoding and transformation strategies. 11 of these are dataset-backed — first-class plugins for AEGIS, BeaverTails, HarmBench, Pliny, ToxicChat, CyberSecEval, UnsafeBench, VLGuard, VLSU, DoNotAnswer, and XSTest.

Dataset-backed plugins (parity with Promptfoo + DeepTeam)

Each of these plugins draws its payload set from a published adversarial AI safety dataset. Use them with the same IDs Promptfoo and DeepTeam expose — drop-in compatible:

aegis — NVIDIA AEGISbeavertails — PKU BeaverTailsharmbench — CAIS HarmBenchpliny — Pliny jailbreak corpustoxicchat — ToxicChatcyberseceval — Meta CyberSecEvalunsafebench — UnsafeBenchvlguard — VLGuard (multimodal)vlsu — VLSU (multimodal)donotanswer — DoNotAnswerxstest — XSTest

Using Plugins in a Security Scan

scan-config.json
{
  "model": "gpt-4o",
  "prompt": "You are a helpful customer support agent.",
  "plugins": [
    "prompt-injection",
    "jailbreak",
    "pii-leak",
    "sql-injection",
    "hallucination-probe"
  ],
  "strategies": ["base64", "leetspeak", "multi-turn"],
  "maxConcurrency": 5
}

Plugins define what attacks to run. Strategies define how to encode or transform the attack payloads for evasion testing.

Prompt Injection & Jailbreak

prompt-injectionjailbreakindirect-injectionsystem-prompt-overridesystem-prompt-leakfew-shot-attackchain-of-thought-exploitcontext-overflowgoal-hijackingroleplay-exploittoken-smugglingascii-smugglingdivergent-repetitionspecial-token-injectionoff-topic-manipulation

Data Exfiltration & Privacy

data-extractiondata-exfiltrationdata-disclosurepii-leakpii-social-engineeringpii-api-responsepii-in-databasepii-in-session-dataphi-disclosureconfidentiality-breachcross-session-leakcross-context-retrievalprompt-extractionsystem-reconnaissancecompetitor-extraction

Technical Security

sql-injectionxssssrfshell-injectionpath-traversalxml-injectionldap-injectioncsv-injectionregex-dosencoding-attackmalicious-codedebug-access

Authorization & Access

bolabflaprivilege-escalationrbac-enforcementaccount-takeoverexcessive-agency

Harmful Content

toxic-outputhate-speechviolence-threatsself-harmharassment-bullyingsexual-contentgraphic-contentprofanityinsultsradicalizationillegal-activityunsafe-practiceschild-exploitation-detection

Bias & Fairness

bias-probestereotypegender-biasage-biasdisability-biasreligious-sensitivitypolitical-opinionaccessibility-discriminationaccessibility-violationadvertising-discriminationfair-housing-discriminationlending-discriminationcoverage-discriminationdiscriminatory-listingssource-of-income-discriminationvaluation-bias

Misinformation & Hallucination

hallucination-probemisinformationoverreliancesycophancyunverifiable-claimscopyright-violationimitation

Industry: Healthcare

medical-advicemedical-anchoring-biasmedical-incorrect-knowledgemedical-off-label-usemedical-prioritization-errormedical-sycophancydosage-calculationdrug-interaction-detectionhipaa

Industry: Finance

financial-advicefinancial-calculation-errorfinancial-compliance-violationfinancial-confidential-disclosurefinancial-counterfactualfinancial-data-leakagefinancial-defamationfinancial-hallucinationfinancial-services-impartialityfinancial-services-misconductfinancial-sycophancyprice-manipulationfraud-enablementsox-compliancepci-dss

Industry: Telecom

billing-misinformationcoverage-misinformationcpni-disclosuree911-misinformationnetwork-misinformationporting-misinformationtcpa-violationtelecom-location-disclosuretelecom-unauthorized-changes

Industry: E-Commerce

ecommerce-compliance-bypassecommerce-order-fraudecommerce-pci-dss

Compliance & Privacy Regulations

gdprhipaaferpacoppapci-dsssox-compliancecontrolled-substance-compliance

Agentic & Multi-Turn

multi-turn-escalationagent-identity-trust-abuseautonomous-agent-driftexploit-tool-agentexternal-system-abusegoal-misalignmentgoal-theftinter-agent-communication-compromisememory-poisoningrecursive-hijackingtool-discoverytool-metadata-poisoningtool-orchestration-abusereasoning-dosmodel-identification

RAG-Specific

rag-poisoningrag-document-exfiltrationrag-source-attribution

Advanced & Research

adversarial-poetryagentic-adversarial-poetrybad-likert-judgepersuasionsteeringwordplaychosen-ciphertext-attackpolicyintentcontractspolicy-violation

Benchmark Datasets

aegisbeavertailscybersecevaldonotanswerharmbenchplinytoxicchatunsafebenchvlguardvlsuxstest

Weapons & Dangerous

chemical-biological-weaponsindiscriminate-weaponsied-detectionmeth-production

Industry: Insurance

insurance-coverage-discriminationinsurance-data-disclosureinsurance-network-misinfoinsurance-phi-disclosure

Industry: Pharmacy

pharmacy-drug-interactionpharmacy-dosage-calculationpharmacy-controlled-substance

Industry: Real Estate

realestate-fair-housingrealestate-steeringrealestate-valuation-biasrealestate-lending-discriminationrealestate-accessibility-discriminationrealestate-advertising-discriminationrealestate-discriminatory-listingsrealestate-source-of-income

Industry: Teen Safety

teen-safety-grooming-detectionteen-safety-dangerous-contentteen-safety-dangerous-roleplayteen-safety-age-restrictedteen-safety-harmful-body-ideals

Attack Strategies

Strategies transform attack payloads to test evasion resistance. Each strategy can be combined with any plugin.

base64rot13leetspeakmorse-codehex-encodingpig-latinreverse-textunicode-escapehomoglyphcamel-case-obfuscationtypo-obfuscationemoji-smugglingjson-wrapxml-wrapmath-encodingtranslationprompt-augmentationretrymulti-turncrescendogoathydratree-searchbest-of-ngcglayerfew-shot-injectionpayload-splittingcontext-switchingjailbreak-prefixlikert-jailbreaksdynamic-jailbreakcomposite-jailbreaksmeta-agent-jailbreaksmischievous-userauthoritative-markupauthority-injectioncitation-attacksmarkdown-injectionaudio-encodingimage-encodingvideo-encoding

Policy & Intent Plugins

The policy, intent, and contracts plugins probe whether the model stays aligned with the role and policies of the application under test. They build their attack payloads from the promptyou supply (the app's purpose) — so the same plugin adapts to your use case without any extra per-plugin configuration.

scan-config.json
{
  "model": "gpt-4o",
  "prompt": "You are a support agent. Never reveal internal pricing tiers.",
  "plugins": ["policy", "intent", "contracts"]
}

policy tests adherence to organizational policy, intent tests whether the model can be redirected away from its stated purpose, and contracts tests for unauthorized commitments. The payloads are derived from the app purpose in your prompt; there is no separate per-plugin pluginOptions surface.