Confidently Wrong — When the AI Writes Plausible Nonsense

AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug. Learn the disguises plausible-but-wrong code wears and the verification habits that catch it.

12 min · Reviewed 2026

The Bug That Smiles Back

Crashing code is a gift. It tells you something is wrong. The dangerous bug is the one where everything runs fine and the answer is wrong. AI is exceptionally good at producing this class of bug because it optimizes for plausibility. Plausibility is not correctness.

The five disguises of plausible nonsense

Disguise	Example	How it bites
Off-by-one	`for i in range(len(arr) - 1)` (skips last)	Last item silently omitted
Wrong default	`dict.get(key, [])` returns `[]`, code expects `None`	Empty-list arithmetic, not the crash you'd expect
Reversed condition	`if user.is_admin == False:` flipped during refactor	Permission check inverted in production
Float vs int	`sum / len(items)` returns int in Python 2 idiom on Python 3 floats	Quietly wrong averages
Time zone	`datetime.now()` instead of `datetime.now(timezone.utc)`	All timestamps off by a server's local offset

Why AI produces these specifically

Loops, conditionals, and aggregations are the highest-density code patterns in training data — many subtly wrong examples exist
The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-style loops)
It defaults to the most common pattern, not the most correct one for your context
Edge cases (empty inputs, single-element arrays) appear less in training than the common case

The three-input verification rule

For any nontrivial AI-generated function, run it on three inputs before trusting it: the happy path, the edge case (empty/null/zero), and a weird case (wrong type, very large, unicode). If you cannot quickly construct three inputs, the function is too vague to test and probably wrong somewhere.

# AI gave you this function for computing percentiles
def percentile(values, p):
    values = sorted(values)
    k = (len(values) - 1) * p / 100
    return values[int(k)]

# Three-input verification:
print(percentile([1, 2, 3, 4, 5], 50))   # 3 — looks right
print(percentile([1, 2, 3, 4, 5], 100))  # 5 — looks right
print(percentile([], 50))                # IndexError — broken

# The model never tested empty input. Almost no AI-written
# function does on the first pass. This is the most common bug class.Three inputs caught a bug the model proudly handed you in 30 seconds.

Differential testing — your secret weapon

If you don't trust the AI-generated implementation, write a slower obviously-correct version and compare. Run both on a hundred random inputs. If they ever disagree, the AI version is wrong. This catches off-by-ones and edge-case omissions almost every time.

import random

def percentile_oracle(values, p):
    """Slow, obviously correct, used only as ground truth."""
    values = sorted(values)
    n = len(values)
    if n == 0:
        return None
    idx = max(0, min(n - 1, int(round(p / 100 * (n - 1)))))
    return values[idx]

for _ in range(1000):
    vals = [random.random() for _ in range(random.randint(1, 50))]
    p = random.uniform(0, 100)
    if percentile(vals, p) != percentile_oracle(vals, p):
        print("DISAGREE:", vals, p)
        break1000 random inputs, two implementations. Disagreement means somebody is wrong — usually the fast one.

Ask the AI to attack itself

# After the AI writes a function, immediately:

"Now act as a hostile reviewer. Find three inputs where this function
  behaves wrong, dangerously, or surprisingly. For each, show the input,
  the actual output, and the expected output."

# About 60% of the time, the model finds its own bug.
# The other 40%, the bugs it lists ARE bugs but it didn't realize it wrote them.Adversarial self-review is a free verification pass. Add it to every code-generation prompt.

The bug that compiles is worse than the bug that crashes.
— An old Lisp hacker, more relevant than ever

The big idea: AI optimizes for code that looks right, not code that is right. Build a habit of three-input verification, adversarial self-review, and differential testing for anything important. The compiled-and-running bug is the one that ships, and the one that hurts most.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-confidently-wrong-creators

What is the core idea behind "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug. Learn the disguises plausible-but-wrong code wears and the verification habits that catch it.
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which term best describes a foundational idea in "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. differential testing
2. plausibility
3. property-based testing
4. edge case
A learner studying Confidently Wrong — When the AI Writes Plausible Nonsense would need to understand which concept?
1. plausibility
2. property-based testing
3. differential testing
4. edge case
Which of these is directly relevant to Confidently Wrong — When the AI Writes Plausible Nonsense?
1. plausibility
2. differential testing
3. edge case
4. property-based testing
Which of the following is a key point about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. Loops, conditionals, and aggregations are the highest-density code patterns in training data — many …
2. The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-s…
3. It defaults to the most common pattern, not the most correct one for your context
4. Edge cases (empty inputs, single-element arrays) appear less in training than the common case
Which of these does NOT belong in a discussion of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-s…
2. hallucination
3. It defaults to the most common pattern, not the most correct one for your context
4. Loops, conditionals, and aggregations are the highest-density code patterns in training data — many …
What is the key insight about "Confidence is not correctness" in the context of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. The model writes code with the same calm tone whether it ran the test or guessed.
4. Always document the fix — comments, commit messages, tickets — the next debugger…
What is the key insight about "Property-based testing pairs perfectly with AI code" in the context of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. Always document the fix — comments, commit messages, tickets — the next debugger…
4. Hypothesis (Python), fast-check (JS), and proptest (Rust) generate hundreds of inputs to break your function.
Which statement accurately describes an aspect of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. Crashing code is a gift. It tells you something is wrong. The dangerous bug is the one where everything runs fine and the answer is wrong.
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
What does working with Confidently Wrong — When the AI Writes Plausible Nonsense typically involve?
1. hallucination
2. For any nontrivial AI-generated function, run it on three inputs before trusting it: the happy path, the edge case (empty/null/zero), and a …
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which of the following is true about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. If you don't trust the AI-generated implementation, write a slower obviously-correct version and compare.
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which best describes the scope of "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. It is unrelated to ai-coding workflows
2. It applies only to the opposite beginner tier
3. It was deprecated in 2024 and no longer relevant
4. It focuses on AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. The five disguises of plausible nonsense
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Why AI produces these specifically
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. The three-input verification rule
4. Always document the fix — comments, commit messages, tickets — the next debugger…

← Back to interactive lesson

Tendril · Creators · AI-Assisted Coding

Confidently Wrong — When the AI Writes Plausible Nonsense

AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug. Learn the disguises plausible-but-wrong code wears and the verification habits that catch it.

12 min · Reviewed 2026

The Bug That Smiles Back

The five disguises of plausible nonsense

Disguise	Example	How it bites
Off-by-one	`for i in range(len(arr) - 1)` (skips last)	Last item silently omitted
Wrong default	`dict.get(key, [])` returns `[]`, code expects `None`	Empty-list arithmetic, not the crash you'd expect
Reversed condition	`if user.is_admin == False:` flipped during refactor	Permission check inverted in production
Float vs int	`sum / len(items)` returns int in Python 2 idiom on Python 3 floats	Quietly wrong averages
Time zone	`datetime.now()` instead of `datetime.now(timezone.utc)`	All timestamps off by a server's local offset

Why AI produces these specifically

Loops, conditionals, and aggregations are the highest-density code patterns in training data — many subtly wrong examples exist
The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-style loops)
It defaults to the most common pattern, not the most correct one for your context
Edge cases (empty inputs, single-element arrays) appear less in training than the common case

The three-input verification rule

# AI gave you this function for computing percentiles
def percentile(values, p):
    values = sorted(values)
    k = (len(values) - 1) * p / 100
    return values[int(k)]

# Three-input verification:
print(percentile([1, 2, 3, 4, 5], 50))   # 3 — looks right
print(percentile([1, 2, 3, 4, 5], 100))  # 5 — looks right
print(percentile([], 50))                # IndexError — broken

# The model never tested empty input. Almost no AI-written
# function does on the first pass. This is the most common bug class.Three inputs caught a bug the model proudly handed you in 30 seconds.

Differential testing — your secret weapon

import random

def percentile_oracle(values, p):
    """Slow, obviously correct, used only as ground truth."""
    values = sorted(values)
    n = len(values)
    if n == 0:
        return None
    idx = max(0, min(n - 1, int(round(p / 100 * (n - 1)))))
    return values[idx]

for _ in range(1000):
    vals = [random.random() for _ in range(random.randint(1, 50))]
    p = random.uniform(0, 100)
    if percentile(vals, p) != percentile_oracle(vals, p):
        print("DISAGREE:", vals, p)
        break1000 random inputs, two implementations. Disagreement means somebody is wrong — usually the fast one.

Ask the AI to attack itself

# After the AI writes a function, immediately:

"Now act as a hostile reviewer. Find three inputs where this function
  behaves wrong, dangerously, or surprisingly. For each, show the input,
  the actual output, and the expected output."

# About 60% of the time, the model finds its own bug.
# The other 40%, the bugs it lists ARE bugs but it didn't realize it wrote them.Adversarial self-review is a free verification pass. Add it to every code-generation prompt.

The bug that compiles is worse than the bug that crashes.
— An old Lisp hacker, more relevant than ever

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-confidently-wrong-creators

What is the core idea behind "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug. Learn the disguises plausible-but-wrong code wears and the verification habits that catch it.
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which term best describes a foundational idea in "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. differential testing
2. plausibility
3. property-based testing
4. edge case
A learner studying Confidently Wrong — When the AI Writes Plausible Nonsense would need to understand which concept?
1. plausibility
2. property-based testing
3. differential testing
4. edge case
Which of these is directly relevant to Confidently Wrong — When the AI Writes Plausible Nonsense?
1. plausibility
2. differential testing
3. edge case
4. property-based testing
Which of the following is a key point about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. Loops, conditionals, and aggregations are the highest-density code patterns in training data — many …
2. The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-s…
3. It defaults to the most common pattern, not the most correct one for your context
4. Edge cases (empty inputs, single-element arrays) appear less in training than the common case
Which of these does NOT belong in a discussion of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. The model copies idioms from one language into another (Python's `len() - 1` confusion came from C-s…
2. hallucination
3. It defaults to the most common pattern, not the most correct one for your context
4. Loops, conditionals, and aggregations are the highest-density code patterns in training data — many …
What is the key insight about "Confidence is not correctness" in the context of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. The model writes code with the same calm tone whether it ran the test or guessed.
4. Always document the fix — comments, commit messages, tickets — the next debugger…
What is the key insight about "Property-based testing pairs perfectly with AI code" in the context of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. Always document the fix — comments, commit messages, tickets — the next debugger…
4. Hypothesis (Python), fast-check (JS), and proptest (Rust) generate hundreds of inputs to break your function.
Which statement accurately describes an aspect of Confidently Wrong — When the AI Writes Plausible Nonsense?
1. Crashing code is a gift. It tells you something is wrong. The dangerous bug is the one where everything runs fine and the answer is wrong.
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
What does working with Confidently Wrong — When the AI Writes Plausible Nonsense typically involve?
1. hallucination
2. For any nontrivial AI-generated function, run it on three inputs before trusting it: the happy path, the edge case (empty/null/zero), and a …
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which of the following is true about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. If you don't trust the AI-generated implementation, write a slower obviously-correct version and compare.
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which best describes the scope of "Confidently Wrong — When the AI Writes Plausible Nonsense"?
1. It is unrelated to ai-coding workflows
2. It applies only to the opposite beginner tier
3. It was deprecated in 2024 and no longer relevant
4. It focuses on AI-generated code that compiles, runs, and produces wrong answers is the most dangerous class of bug
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. The five disguises of plausible nonsense
2. hallucination
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Why AI produces these specifically
3. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
4. Always document the fix — comments, commit messages, tickets — the next debugger…
Which section heading best belongs in a lesson about Confidently Wrong — When the AI Writes Plausible Nonsense?
1. hallucination
2. Bug is in client-side JS the agent can't run — instrument with Sentry first, the…
3. The three-input verification rule
4. Always document the fix — comments, commit messages, tickets — the next debugger…

← Back to interactive lesson