Cyber Risk and Autonomous AI Attackers

AI agents can already find some software vulnerabilities and write exploits. What happens when those capabilities scale? A clear-eyed walk through the data.

40 min · Reviewed 2026

The Capability Curve

Offensive cyber has been an AI-relevant domain for years. What is new — since roughly 2024 — is agentic capability: models that take multi-step actions, use tools, and pursue goals across hours of operation. This has moved AI from an assistant for human hackers to a plausible operator.

What agents can do (as of ~2025-2026)

Solve low-to-medium Capture the Flag challenges autonomously (published METR, AISI evaluations)
Write working exploits for known, well-documented vulnerabilities
Perform recon: enumerate assets, identify plausible vulnerabilities, generate credential lists
Partially automate phishing campaigns with personalized content
Find some novel vulnerabilities in real open-source code (Google Project Zero, Anthropic demos)

Why defense is not symmetric

Attackers need one path. Defenders must close all paths.
AI helps both, but asymmetric gains may favor attackers in specific domains (scale of phishing, exploit development)
AI-assisted defense — automated patching, anomaly detection, log analysis — is also accelerating
Net effect is not yet clear; 2024-2025 data suggests roughly offsetting gains with high variance by sector

What is being done

Frontier labs run cyber-specific pre-deployment evaluations (OpenAI preparedness, Anthropic RSP)
CISA, UK NCSC, and partners publish AI cyber guidance
DARPA AIxCC (AI Cyber Challenge) develops defensive AI
Bug bounty programs are being restructured for AI-driven findings

We're in the strange position of hoping the offense-defense balance stays close, because any big asymmetry either way breaks a lot of what holds the internet together.
— Heather Adkins, Google / board member commentary (paraphrased from public talks)

The big idea: AI in cyber is not science fiction. It is a real, scaling capability with measured progress on both sides. The question for the next several years is whether defense keeps up — and what policy levers help it.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-cyber-risk-ai-creators

What is the main idea of "Cyber Risk and Autonomous AI Attackers"?
1. AI agents can already find some software vulnerabilities and write exploits.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Cyber Risk and Autonomous AI Attackers"?
1. CTF
2. autonomous agent
3. vulnerability research
4. offense-defense balance
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Solve low-to-medium Capture the Flag challenges autonomously (published METR, AISI evaluations)
4. Treat the AI output as automatically correct
What should a careful learner remember about "What they cannot reliably do yet"?
1. Use AI to draft or organize ideas about autonomous agent, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. AI cannot make the human values decision for you.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about autonomous agent be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about autonomous agent.
Which action would help you apply "Cyber Risk and Autonomous AI Attackers" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Write working exploits for known, well-documented vulnerabilities

← Back to interactive lesson

Tendril · Creators · Ethics & Society