Performance Bugs in AI-Generated Code

AI writes code that works on small inputs and crawls on large ones. Learn the top patterns of AI-introduced performance issues, the profiling tools that surface them, and the prompts that prevent them.

12 min · Reviewed 2026

Works on My Machine. Crawls in Production.

AI does not feel performance. It writes code that is correct on three test inputs and devastating on three million. The result: a feature that ships green, then takes the database down on Monday morning. The bugs are stereotyped, and so are the fixes.

The top six performance bugs AI generates

Pattern	Symptom	Fix
N+1 queries	Loop calls DB once per item	Single query with `IN`, JOIN, or batched fetch
Quadratic loops on lists	`for x in a: if x in b:` with b as list	Convert b to a set first
Synchronous in async	`requests.get(...)` inside async function	`httpx.AsyncClient`, `await`
Loading whole file/table to filter	`df = pd.read_csv(...).query(...)`	Filter at source (SQL WHERE, csv chunks)
No pagination	Endpoint returns all 50k records	Cursor or offset pagination
Allocating in a hot loop	`new Date()` per iteration	Hoist out of the loop

The N+1 trap, in detail

# AI gives you this — looks fine, ships green:
def get_user_emails():
    users = User.objects.all()  # 1 query
    return [
        {"id": u.id, "email": u.email, "team": u.team.name}
        # u.team.name triggers a query per user. 10k users = 10,001 queries.
        for u in users
    ]

# The fix: prefetch / select_related
def get_user_emails():
    users = User.objects.select_related("team").all()  # 1 join query
    return [
        {"id": u.id, "email": u.email, "team": u.team.name}
        for u in users
    ]

# Same code, 10000x faster on large data.The N+1 is the most common AI-introduced perf bug. Every ORM has the same fix; AI rarely reaches for it unprompted.

Performance prompts that work

# Prepend to any prompt where data size matters:

"This function will run on 100k+ rows in production.
Constraints:
  - Must complete in under 200ms.
  - O(N log N) or better.
  - No N+1 queries — use joins/IN clauses.
  - Stream the result if it doesn't fit in memory.
  - Add a comment with the expected complexity."Naming the input scale changes the model's defaults completely. "100k rows" produces different code than "a list".

Profile-then-fix, with AI

# 1. Run a profiler on the slow function (cProfile, py-spy, clinic.js, etc.)
# 2. Paste the profiler output into chat:

"Here is py-spy output for a function that takes 8s on 100k rows.
The top 3 hot spots are <paste>. Suggest the smallest possible change
to each that would speed it up. Show before/after for each."

# AI is excellent at reading flame graphs and profiler output.
# This is one of its highest-value uses for performance.AI is a junior performance engineer when handed real profile data. Without it, AI is a guesser.

Memory bugs are quieter and meaner

Holding references in a long-running list — looks fine until OOM
Reading a 5GB file into memory instead of streaming
Caching with no eviction — process grows forever
Closures that capture too much (entire scope) in JS

Use the AI to generate benchmarks, not just code

# After AI writes the function, immediately:

"Write a microbenchmark that runs this function on:
  - 100 items (warm-up)
  - 10k items
  - 1M items
  Report time per call and memory peak. Use timeit + tracemalloc."

# 60 seconds of work, surfaces 80% of perf bugs before they ship.Benchmarking is a habit. Add it to every nontrivial function, just like tests.

When perf is the requirement, write the test first

If the function MUST run in under 50ms on 10k inputs, write a test that asserts that — `assert duration_ms < 50`. Now performance is part of the spec. Test-driven prompting works for performance just like correctness.

AI writes code for the inputs it can imagine. Production has the inputs it can't.
— An SRE

The big idea: performance is invisible to AI without explicit signal. State your scale, write benchmarks, profile the hot path, and let the AI optimize against measured reality. Without that signal, the model defaults to whatever pattern it saw most — usually correct, often slow.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-debug-performance-bugs-creators

Why is checking membership with `if x in list` inside a loop inefficient for large datasets?
1. Lists cannot store more than 10,000 elements efficiently
2. Lists use more memory than sets for the same number of elements
3. Membership checking on a list is O(n) and repeats for every iteration, making the overall operation O(n²)
4. The `in` operator does not work with lists, only with dictionaries
What happens when you use the synchronous `requests` library inside an async Python function?
1. The code automatically parallelizes all HTTP requests
2. The async function blocks the entire event loop during each request, destroying performance
3. The requests are queued and executed after the function completes
4. The function automatically converts to synchronous execution
What is the inefficient pattern when processing large CSV files?
1. Using generator expressions to process line by line
2. Reading the entire file into memory before filtering
3. Using pandas read_csv with chunking enabled
4. Filtering at the source with SQL WHERE clauses
What problem occurs when an API endpoint returns all 50,000 database records at once?
1. The database automatically deletes old records
2. The API automatically caches all records permanently
3. The database creates too many indexes slowing down queries
4. The network payload becomes massive, causing timeouts and memory exhaustion
Why is creating a new Date() or timestamp object inside a loop considered a performance bug?
1. Date objects cannot be created inside loops due to language restrictions
2. The allocation overhead repeats for every iteration, wasting CPU cycles
3. Date objects automatically persist after the loop ends causing memory leaks
4. Loops with Date creation cannot be optimized by any compiler
What does asking an AI to 'add a docstring stating time and space complexity' accomplish?
1. It prevents the AI from writing any code with loops
2. It automatically optimizes the code to match the stated complexity
3. It forces the AI to make its assumptions explicit, exposing hidden O(n²) behavior
4. It makes the code run faster by adding documentation
Which memory leak pattern is described as 'holding references in a long-running list'?
1. Data is stored in a database rather than memory
2. The program runs too quickly to notice memory issues
3. Memory is automatically freed after each function call
4. Objects that should be deleted remain referenced, preventing garbage collection
What is wrong with caching every AI-generated response without eviction?
1. Caching prevents the program from running at all
2. Caching is always faster than not caching
3. The cache grows unbounded, eventually consuming all available memory
4. Responses cannot be cached because they contain random data
In JavaScript, what problem occurs when a closure captures its entire surrounding scope?
1. The closure automatically executes immediately
2. Closures cannot capture scopes with more than three variables
3. The closure retains references to all variables in scope, potentially retaining large objects in memory
4. The code will not compile in strict mode
Why should benchmarks be generated from real anonymized production samples when possible?
1. AI tends to test on small, typical inputs while production has unusual edge cases
2. Production samples make code run faster
3. Production samples are always smaller than test data
4. Benchmarks cannot use artificial data under GDPR
What is 'test-driven prompting' for performance?
1. Using AI to test the performance of other AI systems
2. Writing tests that assert specific performance thresholds before writing the code
3. Only testing code after it has been deployed to production
4. Testing that the code produces correct output regardless of speed
Why does AI write code that performs well on small inputs but poorly on large ones?
1. AI cannot feel or measure performance, only correctness
2. AI specifically targets small input performance for mobile devices
3. AI automatically optimizes for the largest possible input
4. Large inputs cause syntax errors in AI-generated code
What does 'profile-then-fix' mean in the context of AI-assisted development?
1. Delete profiling code before deploying to production
2. Run a profiler to identify bottlenecks, then use AI to optimize only those specific areas
3. Write the entire optimization before profiling
4. Profile the AI model instead of the code it generates
What is the recommended fix for the quadratic loop pattern `for x in a: if x in b` when b is a list?
1. Put b inside the condition with parentheses
2. Convert b to a set before the loop
3. Add a break statement after the if condition
4. Use a dictionary instead of a list
Which HTTP client should be used in an async Python function for proper non-blocking behavior?
1. `httpx.AsyncClient` with await
2. Any HTTP library works the same in async functions
3. The `urllib` library in synchronous mode
4. The standard `requests` library with threading

← Back to interactive lesson

Tendril · Creators · AI-Assisted Coding