Loading lesson…
Async lets your program make 100 API calls at once instead of one at a time. Essential for LLM apps. You'll write the two patterns that solve 90% of cases.
Every call to Claude or GPT takes 1–10 seconds. If your app makes 50 calls sequentially, it takes minutes. With asyncio, you fire all 50 at once and wait only as long as the slowest one. This single pattern is the difference between a toy script and a production app.
import asyncio
import httpx
async def fetch(client: httpx.AsyncClient, url: str) -> dict:
response = await client.get(url, timeout=10)
response.raise_for_status()
return response.json()
async def main() -> None:
urls = [
"https://api.github.com/users/anthropic",
"https://api.github.com/users/openai",
"https://api.github.com/users/vercel",
]
async with httpx.AsyncClient() as client:
results = await asyncio.gather(*(fetch(client, u) for u in urls))
for data in results:
print(data["login"], "-", data.get("bio", ""))
asyncio.run(main())asyncio.gather runs all coroutines concurrently. Three API calls, one total wait.import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic()
semaphore = asyncio.Semaphore(5) # max 5 concurrent requests
async def summarize(text: str) -> str:
async with semaphore:
response = await client.messages.create(
model="claude-opus-4-7",
max_tokens=200,
messages=[{"role": "user", "content": f"One sentence summary:\n{text}"}],
)
return response.content[0].text
async def main():
articles = ["...text 1...", "...text 2...", "...text 20..."]
summaries = await asyncio.gather(*(summarize(a) for a in articles))
for s in summaries:
print(s)
asyncio.run(main())Semaphore caps concurrent requests so you don't hit the provider's rate limit.results = await asyncio.gather(
*(fetch(client, u) for u in urls),
return_exceptions=True, # don't fail-fast
)
for url, result in zip(urls, results):
if isinstance(result, Exception):
print(f"Failed {url}: {result}")
else:
print(f"OK {url}")return_exceptions=True collects failures instead of aborting the whole batch — crucial for LLM calls.| Sync | Async |
|---|---|
| 10 API calls = 10 × 2s = 20s | 10 API calls ≈ 2s (all at once) |
| Easy to reason about | Requires thinking in coroutines |
| Good for: scripts, simple tools | Good for: servers, LLM apps, scrapers |
Big idea: most AI code spends 99% of its time waiting. Async is how you stop waiting in series and start waiting in parallel.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-prog-python-async-creators
What is the core idea behind "Python async/await — Waiting Without Blocking"?
Which term best describes a foundational idea in "Python async/await — Waiting Without Blocking"?
A learner studying Python async/await — Waiting Without Blocking would need to understand which concept?
Which of these is directly relevant to Python async/await — Waiting Without Blocking?
Which of the following is a key point about Python async/await — Waiting Without Blocking?
What is one important takeaway from studying Python async/await — Waiting Without Blocking?
Which of these does NOT belong in a discussion of Python async/await — Waiting Without Blocking?
What is the key insight about "The sync/async trap" in the context of Python async/await — Waiting Without Blocking?
What is the recommended tip about "Always review AI output" in the context of Python async/await — Waiting Without Blocking?
Which statement accurately describes an aspect of Python async/await — Waiting Without Blocking?
What does working with Python async/await — Waiting Without Blocking typically involve?
Which best describes the scope of "Python async/await — Waiting Without Blocking"?
Which section heading best belongs in a lesson about Python async/await — Waiting Without Blocking?
Which section heading best belongs in a lesson about Python async/await — Waiting Without Blocking?
Which section heading best belongs in a lesson about Python async/await — Waiting Without Blocking?