AI Red Teamer in 2026: Breaking Models for a Living

A real job now: adversarially probing LLMs and multimodal systems for jailbreaks, prompt injection, data exfiltration, and harm.

CreatorsCareers & Pathways~18 min readAdvancedProfessionalBI2 · Representation & ReasoningBI5 · Societal ImpactPrint / PDF

Lesson map

What this lesson covers

30 min11 blocks5 concepts

Learning path

The main moves in order

1What AI red teamers do
2Specialized tools
3jailbreaks
4prompt injection

Concept cluster

Terms to connect while reading

jailbreaksprompt injectionevaluationharm taxonomyresponsible disclosure

Sections2

Lists2

Notes3

Compare1

Terms1

Sam starts a bug-bash sprint Monday on a new agent release. The team has a harm taxonomy — CSAM, weapons, cyber, self-harm, privacy leaks, autonomous-action harms — and a list of new attack patterns from this quarter's research. By Friday Sam has filed 34 confirmed bypasses, 12 of them novel enough to write up for internal distribution. The model ships Tuesday with patches for 28 of them. The other six are scoped in the system card as known limitations.

Section 1

What AI red teamers do

Prompt injection — indirect and direct, across tools and browsing.
Jailbreaks — roleplay, encoding tricks, low-resource languages.
Data exfiltration — from tool use, memory, and system prompts.
Agent harm testing — do agents take harmful real-world actions?
Multimodal attacks — image and audio payloads.
Evaluation design — building automated evals to catch regressions.
Responsible disclosure — writing up findings for mitigation.

Section 2

Specialized tools

Tools like PyRIT (Microsoft) and Garak — open-source LLM red-team frameworks.
Promptfoo, Inspect (UK AISI), and Anthropic's evals for reproducible testing.
HarmBench and JailbreakBench for benchmarks.
Agentic sandboxes for browsing/tool agents.
MITRE ATLAS — adversarial ML threat framework.
Internal red-team platforms at Anthropic, OpenAI, Google DeepMind.

Check-in 1. Got it so far?

Compare the options

Task	Before AI (2020)	Now (2026)
Finding jailbreaks	Not a job category.	Full-time teams at every frontier lab.
Evaluation	Static benchmarks.	Dynamic, adversarial, continuously rotating.
Disclosure	Ad hoc.	Formal process mirroring infosec CVDs.

Key terms in this lesson

Check-in 2. Got it so far?

If you want to be an AI red teamer: Background in security (offensive security, bug bounty), ML engineering, or adversarial ML research. A CS degree helps; so does a linguistics or psychology background for prompt craft. Read the OpenAI, Anthropic, and DeepMind system cards and model cards cover to cover. Contribute to open red-team tooling. Write up your findings publicly within safe limits. Frontier labs and consultancies hire hard in this space.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “AI Red Teamer in 2026: Breaking Models for a Living”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

AI Red Teamer in 2026: Breaking Models for a Living

What AI red teamers do

Specialized tools

Curious about “AI Red Teamer in 2026: Breaking Models for a Living”?

Keep going

AI Red Teamer in 2026: Breaking Models for a Living

What AI red teamers do

Specialized tools

Curious about “AI Red Teamer in 2026: Breaking Models for a Living”?

Keep going