Lesson 327 of 2116
Confidently Wrong — When the AI Writes Plausible Nonsense
AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug. Learn the disguises plausible-but-wrong code wears and the verification habits that catch it.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Bug That Smiles Back
- 2plausibility
- 3silent failure
- 4edge case
Concept cluster
Terms to connect while reading
Section 1
The Bug That Smiles Back
Crashing code is a gift. It tells you something is wrong. The dangerous bug is the one where everything runs fine and the answer is wrong. AI is exceptionally good at producing this class of bug because it optimizes for plausibility. Plausibility is not correctness.
The five disguises of plausible nonsense
Compare the options
| Disguise | Example | How it bites |
|---|---|---|
| Off-by-one | `for i in range(len(arr) - 1)` (skips last) | Last item silently omitted |
| Wrong default | `dict.get(key, [])` returns `[]`, code expects `None` | Empty-list arithmetic, not the crash you'd expect |
| Reversed condition | `if user.is_admin == False:` flipped during refactor | Permission check inverted in production |
| Float vs int | `sum / len(items)` returns int in Python 2 idiom on Python 3 floats | Quietly wrong averages |
| Time zone | `datetime.now()` instead of `datetime.now(timezone.utc)` | All timestamps off by a server's local offset |
Why AI produces these specifically
- Loops, conditionals, and aggregations are the highest-density code patterns in training data — many subtly wrong examples exist
- The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-style loops)
- It defaults to the most common pattern, not the most correct one for your context
- Edge cases (empty inputs, single-element arrays) appear less in training than the common case
The three-input verification rule
For any nontrivial AI-generated function, run it on three inputs before trusting it: the happy path, the edge case (empty/null/zero), and a weird case (wrong type, very large, unicode). If you cannot quickly construct three inputs, the function is too vague to test and probably wrong somewhere.
Three inputs caught a bug the model proudly handed you in 30 seconds.
# AI gave you this function for computing percentiles
def percentile(values, p):
values = sorted(values)
k = (len(values) - 1) * p / 100
return values[int(k)]
# Three-input verification:
print(percentile([1, 2, 3, 4, 5], 50)) # 3 — looks right
print(percentile([1, 2, 3, 4, 5], 100)) # 5 — looks right
print(percentile([], 50)) # IndexError — broken
# The model never tested empty input. Almost no AI-written
# function does on the first pass. This is the most common bug class.Differential testing — your secret weapon
If you don't trust the AI-generated implementation, write a slower obviously-correct version and compare. Run both on a hundred random inputs. If they ever disagree, the AI version is wrong. This catches off-by-ones and edge-case omissions almost every time.
1000 random inputs, two implementations. Disagreement means somebody is wrong — usually the fast one.
import random
def percentile_oracle(values, p):
"""Slow, obviously correct, used only as ground truth."""
values = sorted(values)
n = len(values)
if n == 0:
return None
idx = max(0, min(n - 1, int(round(p / 100 * (n - 1)))))
return values[idx]
for _ in range(1000):
vals = [random.random() for _ in range(random.randint(1, 50))]
p = random.uniform(0, 100)
if percentile(vals, p) != percentile_oracle(vals, p):
print("DISAGREE:", vals, p)
breakAsk the AI to attack itself
Adversarial self-review is a free verification pass. Add it to every code-generation prompt.
# After the AI writes a function, immediately:
"Now act as a hostile reviewer. Find three inputs where this function
behaves wrong, dangerously, or surprisingly. For each, show the input,
the actual output, and the expected output."
# About 60% of the time, the model finds its own bug.
# The other 40%, the bugs it lists ARE bugs but it didn't realize it wrote them.“The bug that compiles is worse than the bug that crashes.”
Key terms in this lesson
The big idea: AI optimizes for code that looks right, not code that is right. Build a habit of three-input verification, adversarial self-review, and differential testing for anything important. The compiled-and-running bug is the one that ships, and the one that hurts most.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Confidently Wrong — When the AI Writes Plausible Nonsense”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
The Landscape: Copilot vs. Cursor vs. Windsurf vs. Claude Code
The AI coding tool market fragmented fast. Let's map the 2026 landscape honestly: who is for autocomplete, who is for agents, who wins on cost, and what the tradeoffs actually feel like.
Creators · 55 min
Red-Teaming Your AI-Generated Code
Agents ship working code that's also quietly insecure. Red-teaming means actively attacking your own code. Let's build the habits that catch real-world exploits before attackers do.
Creators · 45 min
Building With v0, Lovable, and Bolt (Fast App Prototyping)
AI app builders turn a prompt into a running app in minutes. Learn the strengths, the ceilings, and the moment you should eject to a real IDE.
