Loading lesson…
Test-driven development meets AI: paste a failing test, ask the agent to make it green, iterate. Learn the discipline that makes AI code reliably correct because correctness is now executable.
A test is the most precise specification possible: "given this input, the function returns this output." There is no room for interpretation. AI loves rooms for interpretation. Take them away and the model gets dramatically more reliable.
1. RED: You write a failing test (or several). 2. PROMPT: Paste the test. Ask the AI to make it pass without breaking others. 3. AGENT: Edits source, runs the test suite, iterates until green. 4. GREEN: All tests pass. You read the diff. 5. REFACTOR: "Improve readability without changing behavior. Tests must stay green." 6. COMMIT: Test + implementation as one atomic change.The tests are the spec, the agent is the implementer, you are the architect.// Step 1 — write the failing tests yourself import { describe, it, expect } from "vitest"; import { parseDuration } from "./parse-duration"; describe("parseDuration", () => { it("parses simple seconds", () => { expect(parseDuration("30s")).toBe(30); }); it("parses minutes", () => { expect(parseDuration("5m")).toBe(300); }); it("parses combined", () => { expect(parseDuration("1h30m")).toBe(5400); }); it("throws on bad input", () => { expect(() => parseDuration("banana")).toThrow(); }); it("returns 0 for empty string", () => { expect(parseDuration("")).toBe(0); }); });Five tests, five sentences of spec — including the edge cases AI usually skips.# Step 2 — the prompt "Make all tests in parse-duration.test.ts pass without breaking any other test in the suite. Run `pnpm test` after each edit. Show me the final implementation only." # Claude Code, Cursor Agent, and Codex CLI will all do this end-to-end # without further intervention.The agent now has executable success criteria. Iteration converges fast.# DANGEROUS PROMPT: "Write the function and the tests for it." # What you'll get: tests written to match the implementation, # not the spec. Tests pass. Code may still be wrong. # The tests are tautologies. # SAFE PROMPT: "Here are the tests. Here is the function signature. Implement the function so all tests pass. Do not add or modify any test."If the AI controls both sides of the equation, both sides will agree even when wrong.Mutation testing tools (Stryker for JS, mutmut for Python) randomly mutate your source and check if any test fails. If your tests don't catch the mutation, your tests have holes — exactly the holes AI loves to hide bugs in. Run mutation tests once after a TDP session and you'll see how good your spec really is.
The test is the only prompt the machine cannot misinterpret.
— A staff engineer who learned the hard way
The big idea: AI is unreliable in proportion to how much it can interpret your intent. Tests collapse interpretation to zero. Write the contract first, lock the tests, and let the agent grind to green. The result is the most reliable AI-generated code you will ever ship.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-test-driven-prompting-creators
What is the main idea of "Test-Driven Prompting — Failing Tests Are the Best Spec"?
Which concept is most central to "Test-Driven Prompting — Failing Tests Are the Best Spec"?
Which use of AI fits this topic best?
What should a careful learner remember about "Let the AI write tests too — but you write the FIRST one"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about TDD be treated?
Name one way to verify an AI answer about TDD.
Which action would help you apply "Test-Driven Prompting — Failing Tests Are the Best Spec" responsibly?