Lesson 35 of 2116
Test-Driven AI Development
TDD was already the gold standard. Paired with an agent, it becomes the tightest feedback loop in software. Here's the full workflow and the pitfalls.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Tightest Feedback Loop in Software
- 2TDD
- 3failing tests
- 4coverage
Concept cluster
Terms to connect while reading
Section 1
The Tightest Feedback Loop in Software
Test-driven AI development pairs classical TDD with an agent. You write the test. The agent writes the code. The test runs. Green or red tells you immediately whether the agent delivered. No hand-waving, no vibes-based shipping.
The canonical loop
- 1Write a failing test that describes the desired behavior
- 2Run the test — confirm it fails for the reason you expected
- 3Ask the agent to make the test pass, with permission to only edit the implementation file
- 4Run all tests. If green, commit. If red, paste the output back
- 5Refactor with the agent — tests still green, structure improved
A realistic Vitest example
Four tests describe the whole contract. The agent has zero room to invent requirements.
// src/pricing.test.ts
import { describe, it, expect } from 'vitest';
import { priceCart } from './pricing';
describe('priceCart', () => {
it('returns 0 for empty cart', () => {
expect(priceCart([])).toBe(0);
});
it('sums item prices', () => {
expect(priceCart([{ price: 10 }, { price: 5 }])).toBe(15);
});
it('applies 10% discount when total > 100', () => {
expect(priceCart([{ price: 120 }])).toBe(108);
});
it('rounds to 2 decimals', () => {
expect(priceCart([{ price: 10.005 }])).toBe(10.01);
});
});
// Now say to the agent:
// "Implement src/pricing.ts so all tests in pricing.test.ts pass.
// Only edit pricing.ts — do not modify the tests."Property-based testing: the force multiplier
Property-based tests let you describe invariants instead of examples. The framework generates hundreds of random inputs and checks the property holds. Paired with an agent, you get code that survives inputs neither of you thought to write.
fast-check generates ~100 randomized carts per run. AI-written code that passes examples often fails properties — this catches it.
import fc from 'fast-check';
it('is always non-negative', () => {
fc.assert(
fc.property(
fc.array(fc.record({ price: fc.float({ min: 0, max: 1000 }) })),
(items) => priceCart(items) >= 0
)
);
});Mutation testing: testing the tests
Mutation testing deliberately breaks your implementation (flips a greater-than to less-than, removes a plus-one) and checks whether your tests catch the mutant. If your tests pass on broken code, your suite has holes. Tools like Stryker and MutPy automate this. Use them quarterly to audit AI-written test suites — they often miss edge cases humans would catch.
Coverage is necessary, not sufficient
- Line coverage: did every line execute? (weakest)
- Branch coverage: did every if/else path execute? (better)
- Mutation score: do tests catch changes to the code? (strongest)
- Property coverage: do invariants hold across generated inputs? (complementary)
When TDD with AI is the wrong tool
- Exploratory scripts — testing overhead dominates the work
- Rapid UI prototyping — visual feedback is the real test
- Research code meant to be thrown away — tests rot as questions change
“Tests are the only specification that runs. Agents are the only implementation that listens.”
Key terms in this lesson
The big idea: the test is the contract, the agent is the contractor, and the suite is the inspector. Done right, TDD-with-AI is the fastest way to ship correct code that has ever existed.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Test-Driven AI Development”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Calling the Claude API With Streaming
Anthropic's SDK in 20 lines. Learn messages, streaming tokens, and basic error handling.
Creators · 50 min
Installing and Using Claude Code CLI
Claude Code is Anthropic's terminal-native coding agent. Let's install it, wire it to a project, and use the features most engineers miss on day one.
Creators · 50 min
AI-Assisted Code Review Workflows (for Teams)
Code review is the highest-leverage touchpoint in a team. Automating the noise with AI frees humans to focus on the irreducibly human parts. Let's design the workflow.
