Loading lesson…
Async lets your program make 100 API calls at once instead of one at a time. Essential for LLM apps. You'll write the two patterns that solve 90% of cases.
Every call to Claude or GPT takes 1–10 seconds. If your app makes 50 calls sequentially, it takes minutes. With asyncio, you fire all 50 at once and wait only as long as the slowest one. This single pattern is the difference between a toy script and a production app.
import asyncio import httpx async def fetch(client: httpx.AsyncClient, url: str) -> dict: response = await client.get(url, timeout=10) response.raise_for_status() return response.json() async def main() -> None: urls = [ "https://api.github.com/users/anthropic", "https://api.github.com/users/openai", "https://api.github.com/users/vercel", ] async with httpx.AsyncClient() as client: results = await asyncio.gather(*(fetch(client, u) for u in urls)) for data in results: print(data["login"], "-", data.get("bio", "")) asyncio.run(main())asyncio.gather runs all coroutines concurrently. Three API calls, one total wait.import asyncio from anthropic import AsyncAnthropic client = AsyncAnthropic() semaphore = asyncio.Semaphore(5) # max 5 concurrent requests async def summarize(text: str) -> str: async with semaphore: response = await client.messages.create( model="claude-opus-4-7", max_tokens=200, messages=[{"role": "user", "content": f"One sentence summary:\n{text}"}], ) return response.content[0].text async def main(): articles = ["text 1", "text 2", "text 20"] summaries = await asyncio.gather(*(summarize(a) for a in articles)) for s in summaries: print(s) asyncio.run(main())Semaphore caps concurrent requests so you don't hit the provider's rate limit.results = await asyncio.gather( *(fetch(client, u) for u in urls), return_exceptions=True, # don't fail-fast ) for url, result in zip(urls, results): if isinstance(result, Exception): print(f"Failed {url}: {result}") else: print(f"OK {url}")return_exceptions=True collects failures instead of aborting the whole batch — crucial for LLM calls.| Sync | Async |
|---|---|
| 10 API calls = 10 × 2s = 20s | 10 API calls ≈ 2s (all at once) |
| Easy to reason about | Requires thinking in coroutines |
| Good for: scripts, simple tools | Good for: servers, LLM apps, scrapers |
Big idea: most AI code spends 99% of its time waiting. Async is how you stop waiting in series and start waiting in parallel.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-prog-python-async-creators
What is the main idea of "Python async/await — Waiting Without Blocking"?
Which concept is most central to "Python async/await — Waiting Without Blocking"?
Which use of AI fits this topic best?
What should a careful learner remember about "The sync/async trap"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about async be treated?
Name one way to verify an AI answer about async.
Which action would help you apply "Python async/await — Waiting Without Blocking" responsibly?