Skip to main content

neural-forge.io

Learn Tracks Models AI Explorer Compare

Sign inStartStart learning

Tendril

Tendril neural-forge.io

Free AI literacy for everyone, supported by trust-safe partners.

Learn

Curriculum
Tracks
For you
Preferences

Resources

Glossary
In the Wild
Newsroom
Community
Partners
Send Feedback
Changelog
About
New to AI?

Schools & Orgs

Schools
Libraries
Tech Teams
Free Access
Sponsor
Sign Up
Support the Mission

Trust

Privacy
Terms
COPPA
Accessibility

Legal

Privacy
Terms
COPPA
Accessibility

© 2026 Tendril·Privacy·Terms·Contact

Built with Claude

Loading lesson…

Tendril

Safety & Governance0%

Time on lesson

0s

← Safety & Governance

0 of 182 complete

○Lesson 1035Bias Auditing in LLM Outputs: Seeing What the Model Can't
○Lesson 1036Copyright and Training Data: What Deployers Actually Need to Know
○Lesson 1037Deepfake Detection: What Works, What Doesn't, and Why It Matters
○Lesson 1038Prompt Injection Defense: Protecting AI Systems From Malicious Inputs
Lesson 1039Jailbreaks and Red-Teaming: Testing Your AI Before Adversaries Do
○Lesson 1040AI Consent in Workplaces: What Employees Deserve to Know
○Lesson 1041Model Cards and Transparency Reports: Reading the Fine Print
○Lesson 1042EU AI Act and Global Regulation: What Deployers Must Track
○Lesson 1043Dual-Use Research Disclosure: When Publishing AI Capabilities Creates Risk
○Lesson 1044Environmental Cost of AI Inference: What the Numbers Actually Mean
○Lesson 21501Bias Audits That Catch Problems Before Deployment: A Production Audit Pipeline
○Lesson 21502Red Team Exercises for AI Systems: Beyond Adversarial Prompts
○Lesson 21504Jailbreak Resistance Testing: A Methodology That Improves Over Time
○Lesson 21505Data Poisoning Detection: Why Your Fine-Tuning Pipeline Needs Provenance Controls
○Lesson 21506AI System Incident Response: Building the Runbook Before the Headline
○Lesson 21507Cross-Border AI Data Compliance: Navigating GDPR, China PIPL, and the State Patchwork
○Lesson 21508AI Vendor Due Diligence: The Questions That Reveal Real Safety Practice
○Lesson 23800AI Supply Chain Attestation: Knowing What's Actually In Your Stack
○Lesson 23801Public Benchmarks vs Private Evals: Why You Need Both
○Lesson 23802AI Incident Public Disclosure: When and How to Tell the World
○Lesson 23803Beyond Accuracy: Evaluating AI Classifiers for Fairness Across Subgroups
○Lesson 23804AI Content Watermarking: Current State of the Art
○Lesson 24500AI Employee Monitoring: Where Surveillance Becomes Counterproductive
○Lesson 24501When Your AI Vendor Has an Incident: What You Owe Your Users
○Lesson 24502Deploying AI Where Children Are Users: COPPA and Beyond
○Lesson 24503AI Medical Decisions: Where Liability Actually Sits
○Lesson 24504Board-Level AI Risk Reporting: What Directors Actually Need
○Lesson 25000AI in Public Sector Procurement: Higher Bars Than Private
○Lesson 25001AI Recommendation Systems: When Engagement Optimization Harms Users
○Lesson 25002AI in Elder Care: Dignity Considerations
○Lesson 25003AI in News Media: Preserving Trust While Using the Tools
○Lesson 25004AI in Housing Decisions: Fair Housing Act Compliance
○Lesson 26400AI in Political Advertising: New Disclosure Requirements
○Lesson 26401Shadow AI Deployments: Inventorying What You Don't Know You Have
○Lesson 26402Explainability for High-Stakes Recommendations
○Lesson 26403AI Vendor Incident History: Due Diligence Before You Sign
○Lesson 26404Employee Protected Speech and AI Monitoring
○Lesson 27500Content Moderation AI Bias: Patterns and Fixes
○Lesson 27501AI Mental Health Tools: Disclosure and Crisis Handling Standards
○Lesson 27502AI Research Ethics: IRB Adaptation
○Lesson 27503EU AI Act: Compliance for US Companies Doing Business in Europe
○Lesson 27504Navigating the US State AI Law Patchwork
○Lesson 29300AI API Rate Limit Abuse: Prevention and Response
○Lesson 29301Preventing Internal AI Tool Misuse
○Lesson 29302Responding to AI Vendor Policy Changes
○Lesson 29304Government AI Procurement: Public Interest Requirements
○Lesson 30600AI Product Launch Ethics Review
○Lesson 30601AI Incident Postmortems: Learning Without Blame
○Lesson 30602Bias Considerations in AI Vendor Selection
○Lesson 30603Employee Rights Around Workplace AI
○Lesson 30604Customer Consent for AI Interactions
○Lesson 32000Acceptable Use Policies for Internal AI
○Lesson 32001Customer-Facing AI Disclosure Patterns
○Lesson 32002Vendor AI Act Compliance Verification
○Lesson 32003Establishing AI Governance Boards
○Lesson 32004AI Ethics Training That Sticks
○Lesson 33600Establishing an AI Ethics Board
○Lesson 33601Engaging Red Teams for AI Safety Testing
○Lesson 33602Public AI Incident Disclosure
○Lesson 33603Engaging Civil Society on AI
○Lesson 33604Engaging Academic Researchers on AI Safety
○Lesson 35600AI Incident Mock Drills
○Lesson 35601Third-Party AI Audits
○Lesson 35602AI Bug Bounty Programs
○Lesson 35603Content Moderation Appeal Processes
○Lesson 35604AI Product Deprecation Ethics
○Lesson 41700AI and content licensing disputes: drafting evidence packets
○Lesson 41701AI and synthetic voice consent: scoping and revocation
○Lesson 41702AI and deepfake takedown workflow: triage and escalation
○Lesson 41703AI and creator attribution policy: what to credit and how
○Lesson 41704AI and style mimicry policy: living artists and ethics review
○Lesson 41705AI and watermark strategy: visible, invisible, and limits
○Lesson 41707AI and children's likeness policy: stricter defaults
○Lesson 41708AI and fan content derivatives: rights, safety, and policy
○Lesson 41709AI and political figure likeness: election-period rules
○Lesson 41710AI and medical likeness policy: patient images and synthesis
○Lesson 41711AI and news deepfake newsroom policy: verification ladder
○Lesson 41712AI and music voice replica policy: artist control rights
○Lesson 41713AI and platform trust and safety staffing: AI cannot fully replace humans
○Lesson 41714AI and incident public comms: transparency without admission
○Lesson 43600AI Grief-Tech Consent: Building Posthumous-Likeness Policies
○Lesson 43601AI Emotion Recognition: Auditing for Banned Use Cases
○Lesson 43602AI Chatbot Suicide-Safety Routing: Designing Escalation Paths
○Lesson 43603AI Child-Safety Classifier Tuning: NCMEC Reporting Workflows
○Lesson 43604AI Stock-Photo Disclosure: Marketplace Provenance Standards
○Lesson 43605AI Academic-Integrity Policy: Drafting Faculty Guidance
○Lesson 43606AI Newsroom Synthesis Disclosure: Bylines and Reader Trust
○Lesson 43607AI Ad-Targeting Audits: Catching Sensitive-Category Inferences
○Lesson 43608AI Research IRB Protocols: Drafting Human-Subject Submissions
○Lesson 43609AI Recommender Radicalization Audits: Trajectory Testing
○Lesson 43610AI Vendor Risk Questionnaires: What to Actually Ask
○Lesson 43611AI Facial Recognition Purpose Limitation: Drafting Internal Controls
○Lesson 43612AI Medical Translation: Disclaimer and Liability Scoping
○Lesson 43613AI Synthetic-Evidence Detection: Litigation-Ready Workflows
○Lesson 43614AI Product Incident Postmortems: Causal Chains for Model Behavior
○Lesson 45502AI and Hiring Video Analysis: Where the Bans Apply
○Lesson 45503AI and Credit Decisions: Adverse-Action Notices That Hold Up
○Lesson 45504AI and Tenant Screening: Bias Audits Before Procurement
○Lesson 45505AI and Classroom Proctoring: Where the Harm Outweighs the Catch
○Lesson 45506AI and Clinical Trial Recruitment: Equitable Outreach Targeting
○Lesson 45507AI and Government Benefits Eligibility: Due-Process Floors
○Lesson 45509AI and Charity Fundraising: Personalization Without Manipulation
○Lesson 45510AI and Religious-Content Classifiers: Avoiding Theological Bias
○Lesson 45511AI and Disability Accommodation: When AI Use Is the Accommodation
○Lesson 45512AI and Immigration Document Translation: Stakes and Verification
○Lesson 45514AI and Citizen Journalism: Verifying User-Submitted Footage
○Lesson 47500AI and Medical Imaging: When the Second Opinion Becomes the First
○Lesson 47501AI and Suicide-Risk Flagging in EdTech: Escalation That Actually Helps
○Lesson 47502AI and Immigration Enforcement: When Your Data Pipeline Becomes a Targeting List
○Lesson 47503AI and Livestream Deepfake Detection: The 30-Second Window
○Lesson 47504AI and Grief-Tech Chatbots: Memorial Bots Without Manipulation
○Lesson 47506AI and Child Influencer Likeness: Consent That Outlives the Childhood
○Lesson 47507AI and Court-Filing Fabrications: Sanctions Are Now Routine
○Lesson 47508AI and Faith Community Impersonation: Synthetic Sermons, Real Harm
○Lesson 47509AI and Disability Accommodation Screening: ADA Risk in Resume Filters
○Lesson 47510AI and Jury Research Deepfakes: Mock Juries Are Becoming Synthetic
○Lesson 47511AI and Foster Care Risk Scoring: Allegheny's Lessons Generalized
○Lesson 47512AI and Public Defender Caseload Triage: Equity Without Abandonment
○Lesson 47513AI and Research Paper Fabrication: Detecting Synthetic Citations and Figures
○Lesson 49500AI Synthetic Media Disclosure Policies: Labeling What You Generate
○Lesson 49501AI-Assisted Election Integrity Content Review: Triage Without Censorship
○Lesson 49502AI Incident Disclosure Letters: Telling Affected Users Honestly
○Lesson 49503AI Model Deprecation User-Impact Memos: Sunsetting Without Surprise
○Lesson 49504AI High-Stakes Recommendation Audits: Reviewing What the Model Suggested
○Lesson 49505AI Vendor Procurement Due-Diligence Briefs: Asking the Right Questions
○Lesson 49507AI Safety Case Narratives: Arguing Why Deployment Is Acceptable
○Lesson 49508AI Feature Consent-Flow Rewrites: Plain-Language User Choices
○Lesson 49509AI Automated-Decision Explanation Letters: Why Was I Denied?
○Lesson 49510AI Responsible Disclosure Policies: Inviting Researchers Without Chaos
○Lesson 49511AI Impact Assessment Summaries: Compressing 60 Pages to 2
○Lesson 49512AI Bias Bounty Program Briefs: Paying People to Find Your Blind Spots
○Lesson 49514AI Policy Exception Request Memos: Asking for a Carve-Out Honestly
○Lesson 51500AI Incident Disclosure Timing: When to Tell Whom About an AI Failure
○Lesson 51501AI Vendor Subprocessor Review: Mapping Who Else Sees Your Data
○Lesson 51504AI Customer Consent Flows: Rewriting Pop-Ups That Actually Inform
○Lesson 51505AI Bug Bounty Scope Documents: Inviting Researchers Without Inviting Lawsuits
○Lesson 51506AI Dataset Provenance Statements: Explaining Where Training Data Came From
○Lesson 51507AI Model Deprecation Notices: Sunsetting Without Stranding Users
○Lesson 51508AI Prompt Injection Postmortems: Writing Up an Attack Without Blame
○Lesson 51510AI Content Moderation Appeals: Building a Path Back for Wrong Decisions
○Lesson 51511AI Academic Integrity Policies: Writing Rules Students Can Actually Follow
○Lesson 51512AI Political Ad Disclosures: Labeling Synthetic Content in Campaigns
○Lesson 51513AI Mental Health Chatbot Guardrails: Drafting Crisis Routing Rules
○Lesson 51514AI Government Procurement Checklists: Asking Vendors the Right Questions
○Lesson 53501AI Synthetic Witness Testimony: Why Bans Exist
○Lesson 53502AI Child-Safety Grooming Detection: Hard Limits
○Lesson 53503AI Disability Benefits: Denial Bias Audits
○Lesson 53504AI Asylum Credibility Scoring: Why It Fails
○Lesson 53505AI Tenant Screening: FCRA Compliance Gaps
○Lesson 53506AI Predictive Policing: Feedback Loop Risk
○Lesson 53507AI Medical Triage: Life-or-Death Limits
○Lesson 53508AI Genomic Data: Reidentification Risk
○Lesson 53509AI Elder-Abuse Monitoring: Consent and Dignity
○Lesson 53510AI Religious Content Translation: Trust Boundaries
○Lesson 53511AI Newsroom Tools: Protecting Confidential Sources
○Lesson 53512AI Union Organizing Surveillance: Legal Ban
○Lesson 53513AI Suicide Hotline Handoff: Mandatory Protocol
○Lesson 53514AI Veterans' Disability Claims: Audit Duties
○Lesson 55500AI and Deepfake Consent Policy: Drafting a Likeness-Use Standard
○Lesson 55503AI and Synthetic Voice Clone Ethics: Guardrails for Voice Talent
○Lesson 55504AI and Content Moderation Appeals: Drafting Defensible Responses
○Lesson 55505AI and Minor Likeness Protection: Creator Workflows for Kids on Camera
○Lesson 55506AI and Monetized Misinformation Risk: Pre-Publish Fact Triage
○Lesson 55507AI and Creator Data Handling Policy: Subscriber Lists and PII
○Lesson 55508AI and Paid Promotion Disclosure: FTC-Safe Ad Labels
○Lesson 55509AI and Fan Harassment Response: Drafting an Escalation Playbook
○Lesson 55510AI and Collab Credit Attribution: Splitting Authorship Fairly
○Lesson 55511AI and Pseudonymous Creator OpSec: Identity Hygiene Audit
○Lesson 55512AI and Archived Content Takedown: Pruning Old Work Safely
○Lesson 55514AI and Sponsorship Vetting Checklist: Filtering Risky Brand Deals
○Lesson 59500AI and Doxx Prevention Audits: What Strangers Can Find About You
○Lesson 59501AI and Mental Load Throttling: Capping Comments You Read
○Lesson 59502AI and Stalker Pattern Detection: Spotting Repeat Offenders Across Aliases
○Lesson 59503AI and Account Recovery Stress Tests: When Your Channel Vanishes
○Lesson 59504AI and Collaboration Vetting Checks: Background on the Person Asking
○Lesson 59505AI and IRL Meetup Safety Prep: Designing Fan Events That Don't Hurt You
○Lesson 59506AI and Financial Scam Recognition: Sponsor Fraud Patterns Creators Miss
○Lesson 59507AI and Content Takedown Evidence Packets: Winning the DMCA Round
○Lesson 59508AI and Mental Health Warning Signs: Creator Burnout Self-Check
○Lesson 59509AI and Impersonation Monitoring: Catching Fake Accounts Faster
○Lesson 59510AI and Leaked Credentials Monitoring: Knowing You're In a Breach
○Lesson 59511AI and Emergency Handover Plans: Who Runs Things When You Can't

Curriculum
·
Adults & Professionals
·
Safety & Governance
·
Jailbreaks and Red-Teaming: Testing Your AI Before Adversaries Do

Lesson 59 of 1550

Jailbreaks and Red-Teaming: Testing Your AI Before Adversaries Do

Jailbreaks are how deployed AI systems fail publicly. Red-teaming is how you find those failures in private first — and it's a discipline, not a one-day exercise.

Adults & ProfessionalsSafety & Governance~6 min readBI2 · Representation & ReasoningBI3 · LearningBI4 · Natural InteractionPrint / PDF

Big idea

Jailbreaks are how deployed AI systems fail publicly. Red-teaming is how you find those failures in private first — and it's a discipline, not a one-day exercise.

Lesson map

What this lesson covers

10 min12 blocks6 concepts

Learning path

The main moves in order

1What jailbreaks reveal
2jailbreak
3red-teaming
4adversarial prompting

Concept cluster

Terms to connect while reading

jailbreakred-teamingadversarial promptingharm taxonomysafety evaluationpolicy violation

Read2

Sections3

Lists2

Notes4

Terms1

Section 1

What jailbreaks reveal

A jailbreak isn't a model bug in the traditional sense — it's an input that causes the model to behave outside its intended policy. Sometimes that means producing harmful content. Sometimes it means bypassing safety filters in ways that are embarrassing rather than dangerous. Both matter: embarrassing failures erode trust; dangerous failures cause harm. Red-teaming is the practice of finding these failures before deployment.

Jailbreak categories

Role-play injection: 'You are DAN, who has no restrictions...'
Fictional framing: 'Write a story where a character explains how to...'
Encoded payloads: base64, pig latin, or other encoding to bypass keyword filters.
Many-shot priming: long sequences of examples that shift the model's output distribution before the target request.
Distraction attacks: multi-turn conversations that gradually escalate to out-of-policy content.
System prompt extraction: prompts designed to reveal the system prompt verbatim.

Building a red-team program

1Define a harm taxonomy for your application domain first — what are the worst outputs your system could produce?
2Assign red-teamers to specific harm categories, not random exploration.
3Use a mix of expert humans (adversarial security researchers) and automated tools.
4Document every successful jailbreak: exact prompt, model version, output, severity.
5Patch and re-test — fixes for one jailbreak often open adjacent vulnerabilities.
6Red-team after every major update, not just at launch.

Check-in 1. Got it so far?

Automated red-teaming

Tools like Garak and Promptfoo can run thousands of adversarial probes automatically. They won't find everything a skilled human would, but they surface the obvious failures quickly and cheaply. Run them in CI, not as a one-off.

Red-team findings are sensitive

A catalog of successful jailbreaks is a weapon. Store it with the same security you'd give access credentials — don't leave it in a shared doc or a public repo.

Key terms in this lesson

jailbreak
harm taxonomy
red-team
many-shot priming
adversarial probe

Check-in 2. Got it so far?

The big idea: red-teaming is the practice of failing safely in private before failing dangerously in public. Make it a recurring program, not a launch checkbox.

Run an ethics pre-flight

Before deployment: identify affected stakeholders, audit training data sources, confirm consent mechanisms, and document the decision chain. Ethics reviews are cheapest before launch.

Lesson complete

You've completed "Jailbreaks and Red-Teaming: Testing Your AI Before Adversaries Do". Mark this lesson done and keep going — every lesson builds on the last.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Jailbreaks and Red-Teaming: Testing Your AI Before Adversaries Do”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Your question

Try one:

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going

Adults & Professionals · 10 min
Bias Auditing in LLM Outputs: Seeing What the Model Can't
LLMs inherit the skews of their training data and RLHF feedback. Auditing for bias isn't a one-time test — it's an ongoing practice that belongs in every deployment.
Adults & Professionals · 10 min
Jailbreak Resistance Testing: A Methodology That Improves Over Time
Jailbreak techniques evolve weekly. A jailbreak test suite that doesn't update is fossilized within months. Here's how to design a testing methodology that learns from the public attack landscape.
Adults & Professionals · 11 min
AI Recommender Radicalization Audits: Trajectory Testing
Recommender systems can drift users toward harmful content — design trajectory audits that test journeys, not just individual recommendations.

Previous: Prompt Injection Defense: Protecting AI Systems From Malicious Inputs

AI Consent in Workplaces: What Employees Deserve to Know: Next

Report an error

Reading mode