Lesson 331 of 2116
Test-Driven Prompting — Failing Tests Are the Best Spec
Test-driven development meets AI: paste a failing test, ask the agent to make it green, iterate. Learn the discipline that makes AI code reliably correct because correctness is now executable.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Spec That Runs
- 2TDD
- 3specification
- 4red-green-refactor
Concept cluster
Terms to connect while reading
Section 1
The Spec That Runs
A test is the most precise specification possible: "given this input, the function returns this output." There is no room for interpretation. AI loves rooms for interpretation. Take them away and the model gets dramatically more reliable.
The TDP loop with AI
The tests are the spec, the agent is the implementer, you are the architect.
1. RED: You write a failing test (or several).
2. PROMPT: Paste the test. Ask the AI to make it pass without breaking others.
3. AGENT: Edits source, runs the test suite, iterates until green.
4. GREEN: All tests pass. You read the diff.
5. REFACTOR: "Improve readability without changing behavior. Tests must stay green."
6. COMMIT: Test + implementation as one atomic change.Why this defeats hallucination
- Hallucinated APIs fail to import — the test never even runs
- Off-by-one bugs fail edge-case tests immediately
- Wrong type returns fail assertion checks
- The agent has a hard signal (test status), not a soft one (your opinion)
- You can't ship code that fails tests, so wrong code can't escape
A real example in TypeScript
Five tests, five sentences of spec — including the edge cases AI usually skips.
// Step 1 — write the failing tests yourself
import { describe, it, expect } from "vitest";
import { parseDuration } from "./parse-duration";
describe("parseDuration", () => {
it("parses simple seconds", () => {
expect(parseDuration("30s")).toBe(30);
});
it("parses minutes", () => {
expect(parseDuration("5m")).toBe(300);
});
it("parses combined", () => {
expect(parseDuration("1h30m")).toBe(5400);
});
it("throws on bad input", () => {
expect(() => parseDuration("banana")).toThrow();
});
it("returns 0 for empty string", () => {
expect(parseDuration("")).toBe(0);
});
});The agent now has executable success criteria. Iteration converges fast.
# Step 2 — the prompt
"Make all tests in parse-duration.test.ts pass without breaking
any other test in the suite. Run `pnpm test` after each edit.
Show me the final implementation only."
# Claude Code, Cursor Agent, and Codex CLI will all do this end-to-end
# without further intervention.The five edge cases to always include
- 1Empty input (string "", array [], object {})
- 2Null and undefined
- 3The boundary value (zero, max int, very long string)
- 4Unicode (emojis, RTL text, combining characters)
- 5Concurrent calls if the function is async — does it race?
The TDP anti-pattern: AI writes tests that pass
If the AI controls both sides of the equation, both sides will agree even when wrong.
# DANGEROUS PROMPT:
"Write the function and the tests for it."
# What you'll get: tests written to match the implementation,
# not the spec. Tests pass. Code may still be wrong.
# The tests are tautologies.
# SAFE PROMPT:
"Here are the tests. Here is the function signature.
Implement the function so all tests pass. Do not add or modify
any test."Pair TDP with mutation testing for confidence
Mutation testing tools (Stryker for JS, mutmut for Python) randomly mutate your source and check if any test fails. If your tests don't catch the mutation, your tests have holes — exactly the holes AI loves to hide bugs in. Run mutation tests once after a TDP session and you'll see how good your spec really is.
“The test is the only prompt the machine cannot misinterpret.”
Key terms in this lesson
The big idea: AI is unreliable in proportion to how much it can interpret your intent. Tests collapse interpretation to zero. Write the contract first, lock the tests, and let the agent grind to green. The result is the most reliable AI-generated code you will ever ship.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Test-Driven Prompting — Failing Tests Are the Best Spec”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 35 min
Tests as Prompts — an Unexpected Superpower
Writing a test first is not just good engineering. It is the clearest possible prompt for an AI. Let's use tests to make AI code reliable.
Creators · 50 min
Test-Driven AI Development
TDD was already the gold standard. Paired with an agent, it becomes the tightest feedback loop in software. Here's the full workflow and the pitfalls.
Creators · 11 min
Refactoring Legacy Code With AI in Small Steps
Use AI to break large refactors into small, verifiable diffs.
