AI and spotting jailbreak prompts: when a 'fun trick' is actually shady

Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.

26 min · Reviewed 2026

The big idea

A jailbreak prompt is a sneaky prompt that tries to trick AI into ignoring its safety rules. Sometimes it's silly, sometimes it's getting weapons info or harmful content. Knowing the patterns keeps you and the model safer.

How to use it

Spot the 'pretend you have no rules' opener
Notice nested role-plays inside role-plays
Watch for 'just for educational purposes' as a cover phrase
Ask AI to explain why a specific prompt counts as a jailbreak

Try it

Have AI show you 3 common jailbreak patterns (without doing them) so you recognize them when a friend texts one.

Why Sharing 'Jailbreak' Prompts Can Get You Banned (or Worse)

The big idea

Jailbreak prompts trick an AI into ignoring its safety rules. They feel anonymous, but every prompt is logged with your account ID, IP address, and device fingerprint. OpenAI, Anthropic, and Google ban accounts permanently for repeat jailbreaking, and in 2024 the UK started prosecuting people who used jailbroken AI to generate CSAM under the Online Safety Act.

Some examples

OpenAI bans roughly 250,000 accounts per quarter for safety-policy violations (their own transparency report).
A 'grandma jailbreak' that gets ChatGPT to recite napalm instructions still logs that you asked — and the log is what gets handed to law enforcement on request.
Discord servers that share jailbreaks get reported by other users and shut down by Trust & Safety teams within days.
If a jailbroken model outputs CSAM, malware, or bioweapon synthesis, federal law treats the prompter as the originator, not the AI company.

Try it!

If you actually want to study how AI safety works, look up the academic field of 'red teaming' — Anthropic, OpenAI, and DeepMind all hire teenagers as paid red teamers through programs like Apart Research. That's the legal version of the same skill.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-ethics-safety-AI-and-spotting-jailbreak-prompts-r7a10-teen

What is the core idea behind "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.
2. federal crime
3. Settle the validity of crime statistics
4. Save your draft history (Google Docs version history)
Which term best describes a foundational idea in "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. prompt injection
2. jailbreak
3. guardrails
4. federal crime
A learner studying AI and spotting jailbreak prompts: when a 'fun trick' is actually shady would need to understand which concept?
1. jailbreak
2. guardrails
3. prompt injection
4. federal crime
Which of these is directly relevant to AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. federal crime
4. guardrails
Which of the following is a key point about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. Spot the 'pretend you have no rules' opener
2. Notice nested role-plays inside role-plays
3. Watch for 'just for educational purposes' as a cover phrase
4. Ask AI to explain why a specific prompt counts as a jailbreak
Which of these does NOT belong in a discussion of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. Watch for 'just for educational purposes' as a cover phrase
2. Notice nested role-plays inside role-plays
3. federal crime
4. Spot the 'pretend you have no rules' opener
What is the key insight about "The rule" in the context of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. If a prompt's whole point is to get past safety, that's the answer about whether it's okay.
4. Save your draft history (Google Docs version history)
Which statement accurately describes an aspect of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. Save your draft history (Google Docs version history)
4. A jailbreak prompt is a sneaky prompt that tries to trick AI into ignoring its safety rules.
What does working with AI and spotting jailbreak prompts: when a 'fun trick' is actually shady typically involve?
1. Have AI show you 3 common jailbreak patterns (without doing them) so you recognize them when a friend texts one.
2. federal crime
3. Settle the validity of crime statistics
4. Save your draft history (Google Docs version history)
Which best describes the scope of "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. It is unrelated to ethics-safety workflows
2. It focuses on Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. How to use it
4. Save your draft history (Google Docs version history)
Which section heading best belongs in a lesson about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. Save your draft history (Google Docs version history)
4. Try it
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime

← Back to interactive lesson

Tendril · Builders · Safety & Governance

AI and spotting jailbreak prompts: when a 'fun trick' is actually shady

Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.

26 min · Reviewed 2026

The big idea

How to use it

Spot the 'pretend you have no rules' opener
Notice nested role-plays inside role-plays
Watch for 'just for educational purposes' as a cover phrase
Ask AI to explain why a specific prompt counts as a jailbreak

Try it

Have AI show you 3 common jailbreak patterns (without doing them) so you recognize them when a friend texts one.

Why Sharing 'Jailbreak' Prompts Can Get You Banned (or Worse)

The big idea

Some examples

OpenAI bans roughly 250,000 accounts per quarter for safety-policy violations (their own transparency report).
A 'grandma jailbreak' that gets ChatGPT to recite napalm instructions still logs that you asked — and the log is what gets handed to law enforcement on request.
Discord servers that share jailbreaks get reported by other users and shut down by Trust & Safety teams within days.
If a jailbroken model outputs CSAM, malware, or bioweapon synthesis, federal law treats the prompter as the originator, not the AI company.

Try it!

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-ethics-safety-AI-and-spotting-jailbreak-prompts-r7a10-teen

What is the core idea behind "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.
2. federal crime
3. Settle the validity of crime statistics
4. Save your draft history (Google Docs version history)
Which term best describes a foundational idea in "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. prompt injection
2. jailbreak
3. guardrails
4. federal crime
A learner studying AI and spotting jailbreak prompts: when a 'fun trick' is actually shady would need to understand which concept?
1. jailbreak
2. guardrails
3. prompt injection
4. federal crime
Which of these is directly relevant to AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. federal crime
4. guardrails
Which of the following is a key point about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. Spot the 'pretend you have no rules' opener
2. Notice nested role-plays inside role-plays
3. Watch for 'just for educational purposes' as a cover phrase
4. Ask AI to explain why a specific prompt counts as a jailbreak
Which of these does NOT belong in a discussion of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. Watch for 'just for educational purposes' as a cover phrase
2. Notice nested role-plays inside role-plays
3. federal crime
4. Spot the 'pretend you have no rules' opener
What is the key insight about "The rule" in the context of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. If a prompt's whole point is to get past safety, that's the answer about whether it's okay.
4. Save your draft history (Google Docs version history)
Which statement accurately describes an aspect of AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. Save your draft history (Google Docs version history)
4. A jailbreak prompt is a sneaky prompt that tries to trick AI into ignoring its safety rules.
What does working with AI and spotting jailbreak prompts: when a 'fun trick' is actually shady typically involve?
1. Have AI show you 3 common jailbreak patterns (without doing them) so you recognize them when a friend texts one.
2. federal crime
3. Settle the validity of crime statistics
4. Save your draft history (Google Docs version history)
Which best describes the scope of "AI and spotting jailbreak prompts: when a 'fun trick' is actually shady"?
1. It is unrelated to ethics-safety workflows
2. It focuses on Learn to recognize jailbreak prompts your friends paste so you don't help break the rules.
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. How to use it
4. Save your draft history (Google Docs version history)
Which section heading best belongs in a lesson about AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. federal crime
2. Settle the validity of crime statistics
3. Save your draft history (Google Docs version history)
4. Try it
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime
Which of the following is a concept covered in AI and spotting jailbreak prompts: when a 'fun trick' is actually shady?
1. jailbreak
2. prompt injection
3. guardrails
4. federal crime

← Back to interactive lesson