Loading lesson…
AI writes code that works on small inputs and crawls on large ones. Learn the top patterns of AI-introduced performance issues, the profiling tools that surface them, and the prompts that prevent them.
AI does not feel performance. It writes code that is correct on three test inputs and devastating on three million. The result: a feature that ships green, then takes the database down on Monday morning. The bugs are stereotyped, and so are the fixes.
| Pattern | Symptom | Fix |
|---|---|---|
| N+1 queries | Loop calls DB once per item | Single query with `IN`, JOIN, or batched fetch |
| Quadratic loops on lists | `for x in a: if x in b:` with b as list | Convert b to a set first |
| Synchronous in async | `requests.get(...)` inside async function | `httpx.AsyncClient`, `await` |
| Loading whole file/table to filter | `df = pd.read_csv(...).query(...)` | Filter at source (SQL WHERE, csv chunks) |
| No pagination | Endpoint returns all 50k records | Cursor or offset pagination |
| Allocating in a hot loop | `new Date()` per iteration | Hoist out of the loop |
# AI gives you this — looks fine, ships green:
def get_user_emails():
users = User.objects.all() # 1 query
return [
{"id": u.id, "email": u.email, "team": u.team.name}
# u.team.name triggers a query per user. 10k users = 10,001 queries.
for u in users
]
# The fix: prefetch / select_related
def get_user_emails():
users = User.objects.select_related("team").all() # 1 join query
return [
{"id": u.id, "email": u.email, "team": u.team.name}
for u in users
]
# Same code, 10000x faster on large data.The N+1 is the most common AI-introduced perf bug. Every ORM has the same fix; AI rarely reaches for it unprompted.# Prepend to any prompt where data size matters:
"This function will run on 100k+ rows in production.
Constraints:
- Must complete in under 200ms.
- O(N log N) or better.
- No N+1 queries — use joins/IN clauses.
- Stream the result if it doesn't fit in memory.
- Add a comment with the expected complexity."Naming the input scale changes the model's defaults completely. "100k rows" produces different code than "a list".# 1. Run a profiler on the slow function (cProfile, py-spy, clinic.js, etc.)
# 2. Paste the profiler output into chat:
"Here is py-spy output for a function that takes 8s on 100k rows.
The top 3 hot spots are <paste>. Suggest the smallest possible change
to each that would speed it up. Show before/after for each."
# AI is excellent at reading flame graphs and profiler output.
# This is one of its highest-value uses for performance.AI is a junior performance engineer when handed real profile data. Without it, AI is a guesser.# After AI writes the function, immediately:
"Write a microbenchmark that runs this function on:
- 100 items (warm-up)
- 10k items
- 1M items
Report time per call and memory peak. Use timeit + tracemalloc."
# 60 seconds of work, surfaces 80% of perf bugs before they ship.Benchmarking is a habit. Add it to every nontrivial function, just like tests.If the function MUST run in under 50ms on 10k inputs, write a test that asserts that — `assert duration_ms < 50`. Now performance is part of the spec. Test-driven prompting works for performance just like correctness.
AI writes code for the inputs it can imagine. Production has the inputs it can't.
— An SRE
The big idea: performance is invisible to AI without explicit signal. State your scale, write benchmarks, profile the hot path, and let the AI optimize against measured reality. Without that signal, the model defaults to whatever pattern it saw most — usually correct, often slow.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-performance-bugs-creators
Why is checking membership with `if x in list` inside a loop inefficient for large datasets?
What happens when you use the synchronous `requests` library inside an async Python function?
What is the inefficient pattern when processing large CSV files?
What problem occurs when an API endpoint returns all 50,000 database records at once?
Why is creating a new Date() or timestamp object inside a loop considered a performance bug?
What does asking an AI to 'add a docstring stating time and space complexity' accomplish?
Which memory leak pattern is described as 'holding references in a long-running list'?
What is wrong with caching every AI-generated response without eviction?
In JavaScript, what problem occurs when a closure captures its entire surrounding scope?
Why should benchmarks be generated from real anonymized production samples when possible?
What is 'test-driven prompting' for performance?
Why does AI write code that performs well on small inputs but poorly on large ones?
What does 'profile-then-fix' mean in the context of AI-assisted development?
What is the recommended fix for the quadratic loop pattern `for x in a: if x in b` when b is a list?
Which HTTP client should be used in an async Python function for proper non-blocking behavior?