Lesson 334 of 2116
Performance Bugs in AI-Generated Code
AI writes code that works on small inputs and crawls on large ones. Learn the top patterns of AI-introduced performance issues, the profiling tools that surface them, and the prompts that prevent them.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Works on My Machine. Crawls in Production.
- 2complexity
- 3N+1 query
- 4profiling
Concept cluster
Terms to connect while reading
Section 1
Works on My Machine. Crawls in Production.
AI does not feel performance. It writes code that is correct on three test inputs and devastating on three million. The result: a feature that ships green, then takes the database down on Monday morning. The bugs are stereotyped, and so are the fixes.
The top six performance bugs AI generates
Compare the options
| Pattern | Symptom | Fix |
|---|---|---|
| N+1 queries | Loop calls DB once per item | Single query with `IN`, JOIN, or batched fetch |
| Quadratic loops on lists | `for x in a: if x in b:` with b as list | Convert b to a set first |
| Synchronous in async | `requests.get(...)` inside async function | `httpx.AsyncClient`, `await` |
| Loading whole file/table to filter | `df = pd.read_csv(...).query(...)` | Filter at source (SQL WHERE, csv chunks) |
| No pagination | Endpoint returns all 50k records | Cursor or offset pagination |
| Allocating in a hot loop | `new Date()` per iteration | Hoist out of the loop |
The N+1 trap, in detail
The N+1 is the most common AI-introduced perf bug. Every ORM has the same fix; AI rarely reaches for it unprompted.
# AI gives you this — looks fine, ships green:
def get_user_emails():
users = User.objects.all() # 1 query
return [
{"id": u.id, "email": u.email, "team": u.team.name}
# u.team.name triggers a query per user. 10k users = 10,001 queries.
for u in users
]
# The fix: prefetch / select_related
def get_user_emails():
users = User.objects.select_related("team").all() # 1 join query
return [
{"id": u.id, "email": u.email, "team": u.team.name}
for u in users
]
# Same code, 10000x faster on large data.Performance prompts that work
Naming the input scale changes the model's defaults completely. "100k rows" produces different code than "a list".
# Prepend to any prompt where data size matters:
"This function will run on 100k+ rows in production.
Constraints:
- Must complete in under 200ms.
- O(N log N) or better.
- No N+1 queries — use joins/IN clauses.
- Stream the result if it doesn't fit in memory.
- Add a comment with the expected complexity."Profile-then-fix, with AI
AI is a junior performance engineer when handed real profile data. Without it, AI is a guesser.
# 1. Run a profiler on the slow function (cProfile, py-spy, clinic.js, etc.)
# 2. Paste the profiler output into chat:
"Here is py-spy output for a function that takes 8s on 100k rows.
The top 3 hot spots are <paste>. Suggest the smallest possible change
to each that would speed it up. Show before/after for each."
# AI is excellent at reading flame graphs and profiler output.
# This is one of its highest-value uses for performance.Memory bugs are quieter and meaner
- Holding references in a long-running list — looks fine until OOM
- Reading a 5GB file into memory instead of streaming
- Caching with no eviction — process grows forever
- Closures that capture too much (entire scope) in JS
Use the AI to generate benchmarks, not just code
Benchmarking is a habit. Add it to every nontrivial function, just like tests.
# After AI writes the function, immediately:
"Write a microbenchmark that runs this function on:
- 100 items (warm-up)
- 10k items
- 1M items
Report time per call and memory peak. Use timeit + tracemalloc."
# 60 seconds of work, surfaces 80% of perf bugs before they ship.When perf is the requirement, write the test first
If the function MUST run in under 50ms on 10k inputs, write a test that asserts that — `assert duration_ms < 50`. Now performance is part of the spec. Test-driven prompting works for performance just like correctness.
“AI writes code for the inputs it can imagine. Production has the inputs it can't.”
Key terms in this lesson
The big idea: performance is invisible to AI without explicit signal. State your scale, write benchmarks, profile the hot path, and let the AI optimize against measured reality. Without that signal, the model defaults to whatever pattern it saw most — usually correct, often slow.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Performance Bugs in AI-Generated Code”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 14 min
Do Not Guess At Performance
When an app feels slow, measure render time, network time, query time, and bundle size before asking the agent to optimize.
Creators · 11 min
Using AI to Triage Performance Suspects
Get a ranked list of likely hot paths from code plus a profile.
Creators · 50 min
The Landscape: Copilot vs. Cursor vs. Windsurf vs. Claude Code
The AI coding tool market fragmented fast. Let's map the 2026 landscape honestly: who is for autocomplete, who is for agents, who wins on cost, and what the tradeoffs actually feel like.
