Confidently Wrong — When the AI Writes Plausible Nonsense

Section 1

The Bug That Smiles Back

Compare the options

Disguise	Example	How it bites
Off-by-one	`for i in range(len(arr) - 1)` (skips last)	Last item silently omitted
Wrong default	`dict.get(key, [])` returns `[]`, code expects `None`	Empty-list arithmetic, not the crash you'd expect
Reversed condition	`if user.is_admin == False:` flipped during refactor	Permission check inverted in production
Float vs int	`sum / len(items)` returns int in Python 2 idiom on Python 3 floats	Quietly wrong averages
Time zone	`datetime.now()` instead of `datetime.now(timezone.utc)`	All timestamps off by a server's local offset

Three inputs caught a bug the model proudly handed you in 30 seconds.

python

# AI gave you this function for computing percentiles
def percentile(values, p):
    values = sorted(values)
    k = (len(values) - 1) * p / 100
    return values[int(k)]

# Three-input verification:
print(percentile([1, 2, 3, 4, 5], 50))   # 3 — looks right
print(percentile([1, 2, 3, 4, 5], 100))  # 5 — looks right
print(percentile([], 50))                # IndexError — broken

# The model never tested empty input. Almost no AI-written
# function does on the first pass. This is the most common bug class.

1000 random inputs, two implementations. Disagreement means somebody is wrong — usually the fast one.

python

import random

def percentile_oracle(values, p):
    """Slow, obviously correct, used only as ground truth."""
    values = sorted(values)
    n = len(values)
    if n == 0:
        return None
    idx = max(0, min(n - 1, int(round(p / 100 * (n - 1)))))
    return values[idx]

for _ in range(1000):
    vals = [random.random() for _ in range(random.randint(1, 50))]
    p = random.uniform(0, 100)
    if percentile(vals, p) != percentile_oracle(vals, p):
        print("DISAGREE:", vals, p)
        break

Adversarial self-review is a free verification pass. Add it to every code-generation prompt.

text

# After the AI writes a function, immediately:

"Now act as a hostile reviewer. Find three inputs where this function
  behaves wrong, dangerously, or surprisingly. For each, show the input,
  the actual output, and the expected output."

# About 60% of the time, the model finds its own bug.
# The other 40%, the bugs it lists ARE bugs but it didn't realize it wrote them.

Key terms in this lesson

Confidently Wrong — When the AI Writes Plausible Nonsense

The Bug That Smiles Back

The five disguises of plausible nonsense

Why AI produces these specifically

The three-input verification rule

Differential testing — your secret weapon

Ask the AI to attack itself

Curious about “Confidently Wrong — When the AI Writes Plausible Nonsense”?

Keep going

Confidently Wrong — When the AI Writes Plausible Nonsense

The Bug That Smiles Back

The five disguises of plausible nonsense

Why AI produces these specifically

The three-input verification rule

Differential testing — your secret weapon

Ask the AI to attack itself

Curious about “Confidently Wrong — When the AI Writes Plausible Nonsense”?

Keep going