Loading lesson…
Test-driven development meets AI: paste a failing test, ask the agent to make it green, iterate. Learn the discipline that makes AI code reliably correct because correctness is now executable.
A test is the most precise specification possible: "given this input, the function returns this output." There is no room for interpretation. AI loves rooms for interpretation. Take them away and the model gets dramatically more reliable.
1. RED: You write a failing test (or several).
2. PROMPT: Paste the test. Ask the AI to make it pass without breaking others.
3. AGENT: Edits source, runs the test suite, iterates until green.
4. GREEN: All tests pass. You read the diff.
5. REFACTOR: "Improve readability without changing behavior. Tests must stay green."
6. COMMIT: Test + implementation as one atomic change.The tests are the spec, the agent is the implementer, you are the architect.// Step 1 — write the failing tests yourself
import { describe, it, expect } from "vitest";
import { parseDuration } from "./parse-duration";
describe("parseDuration", () => {
it("parses simple seconds", () => {
expect(parseDuration("30s")).toBe(30);
});
it("parses minutes", () => {
expect(parseDuration("5m")).toBe(300);
});
it("parses combined", () => {
expect(parseDuration("1h30m")).toBe(5400);
});
it("throws on bad input", () => {
expect(() => parseDuration("banana")).toThrow();
});
it("returns 0 for empty string", () => {
expect(parseDuration("")).toBe(0);
});
});Five tests, five sentences of spec — including the edge cases AI usually skips.# Step 2 — the prompt
"Make all tests in parse-duration.test.ts pass without breaking
any other test in the suite. Run `pnpm test` after each edit.
Show me the final implementation only."
# Claude Code, Cursor Agent, and Codex CLI will all do this end-to-end
# without further intervention.The agent now has executable success criteria. Iteration converges fast.# DANGEROUS PROMPT:
"Write the function and the tests for it."
# What you'll get: tests written to match the implementation,
# not the spec. Tests pass. Code may still be wrong.
# The tests are tautologies.
# SAFE PROMPT:
"Here are the tests. Here is the function signature.
Implement the function so all tests pass. Do not add or modify
any test."If the AI controls both sides of the equation, both sides will agree even when wrong.Mutation testing tools (Stryker for JS, mutmut for Python) randomly mutate your source and check if any test fails. If your tests don't catch the mutation, your tests have holes — exactly the holes AI loves to hide bugs in. Run mutation tests once after a TDP session and you'll see how good your spec really is.
The test is the only prompt the machine cannot misinterpret.
— A staff engineer who learned the hard way
The big idea: AI is unreliable in proportion to how much it can interpret your intent. Tests collapse interpretation to zero. Write the contract first, lock the tests, and let the agent grind to green. The result is the most reliable AI-generated code you will ever ship.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-test-driven-prompting-creators
What is the core idea behind "Test-Driven Prompting — Failing Tests Are the Best Spec"?
Which term best describes a foundational idea in "Test-Driven Prompting — Failing Tests Are the Best Spec"?
A learner studying Test-Driven Prompting — Failing Tests Are the Best Spec would need to understand which concept?
Which of these is directly relevant to Test-Driven Prompting — Failing Tests Are the Best Spec?
Which of the following is a key point about Test-Driven Prompting — Failing Tests Are the Best Spec?
Which of these does NOT belong in a discussion of Test-Driven Prompting — Failing Tests Are the Best Spec?
Which statement is accurate regarding Test-Driven Prompting — Failing Tests Are the Best Spec?
Which of these does NOT belong in a discussion of Test-Driven Prompting — Failing Tests Are the Best Spec?
What is the key insight about "Let the AI write tests too — but you write the FIRST one" in the context of Test-Driven Prompting — Failing Tests Are the Best Spec?
What is the key insight about "Lock the tests" in the context of Test-Driven Prompting — Failing Tests Are the Best Spec?
Which statement accurately describes an aspect of Test-Driven Prompting — Failing Tests Are the Best Spec?
What does working with Test-Driven Prompting — Failing Tests Are the Best Spec typically involve?
Which of the following is true about Test-Driven Prompting — Failing Tests Are the Best Spec?
Which best describes the scope of "Test-Driven Prompting — Failing Tests Are the Best Spec"?
Which section heading best belongs in a lesson about Test-Driven Prompting — Failing Tests Are the Best Spec?