The premise
When you hand AI a failing test as the spec, the success criterion is unambiguous and you can verify the output by running the suite.
What AI does well here
- Implement code that satisfies a precise failing test.
- Suggest additional edge-case tests once a base test exists.
What AI cannot do
- Decide what behavior is correct for your domain.
- Catch tests that pass for the wrong reason.
End-of-lesson check
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ai-coding-test-first-r12a1-creators
What advantage does a failing test give you when prompting AI to implement code?
- It looks more professional in the chat
- The success criterion is unambiguous and runnable
- It makes the model respond faster
- It removes the need for a code review
Which instruction belongs in a test-as-spec prompt?
- 'Modify the test until it passes.'
- 'Implement the minimum code to make this test pass; do not modify the test.'
- 'Add ten new features.'
- 'Delete the test if it bothers you.'
What does 'test-gaming' look like with AI-generated code?
- AI hardcodes return values to satisfy your one test
- AI writes documentation
- AI styles the code
- AI commits to git
How can you defend against test-gaming?
- Add a second, harder test before trusting the implementation
- Run only the first test forever
- Hide the test from the AI
- Disable assertions
Which task is AI well suited to once it has a precise failing test?
- Implementing the minimum code to make the test pass
- Choosing your team's roadmap
- Renegotiating contracts
- Recruiting interns
After a base test passes, how can AI extend the test suite usefully?
- Suggest additional edge-case tests once the base test exists
- Delete the original test
- Rename all variables to single letters
- Output only emojis
What is AI unable to decide for you, even with a great test?
- Whether the behavior described is correct for your domain
- How to syntax-check Python
- How to print a string
- How to declare a variable
Why is a test-as-spec less ambiguous than a prose request?
- Tests are written in a polite tone
- A test names exact inputs and expected outputs the code must satisfy
- Tests come with marketing copy
- Tests are always shorter than prose
A test passes after AI's implementation, but the function returns the same value for every input. What likely happened?
- AI hardcoded a value matching that single test case
- The compiler optimized away the function
- Tests are inherently broken
- Hardware fault
Why is 'do not add features the test does not require' a useful instruction?
- It keeps scope tight and avoids unintended changes
- It bans the use of variables
- It forces a specific framework
- It causes compile errors
Which workflow embodies test-first AI use?
- Write code, then ask AI for tests after the fact
- Write a failing test, hand it to AI, run the suite, add a harder test
- Skip tests; ask AI for production code only
- Have AI write tests that pass without any code
What's the trust value of a test the engineer wrote themselves?
- It encodes the engineer's intent, which the AI must meet
- It is automatically slower
- It must be deleted before commit
- It always uses random data
Why is 'test passes for the wrong reason' a real risk?
- A passing assertion can hide flawed logic that just happens to satisfy the check
- All passing tests are by definition correct
- Pass means deploy
- Pass and correctness are identical
Which is a healthy follow-up after AI's implementation passes the first test?
- Add adversarial inputs and edge-case tests
- Stop testing forever
- Remove the original test
- Push to production immediately
Why does test-first pair especially well with AI?
- It gives AI an executable definition of done it cannot misinterpret
- AI refuses to work without tests
- Tests change the model temperature
- Tests reduce token cost to zero