Standalone lesson.
Lesson 1562 of 1570
How LLMs Work
Tokens, probabilities, training, and why next-word prediction feels smart.
Every time you chat with Claude or ChatGPT, something weird is happening under the hood: the AI is literally predicting the next word, one at a time, as fast as possible. That’s it. That’s the whole trick.
Step 1 — Words become numbers
The first thing an LLM does is chop your message into tokens. A token is usually a word or a piece of a word. “Unbelievable” might become un + believ + able. Each token becomes a number the computer can work with.
Step 2 — Predict the next token
The AI asks: given all those tokens, what’s the most likely next one? It computes a probability for every possibletoken in its vocabulary — usually around 100,000 options — and picks one.
Step 3 — Add it. Repeat.
The new token gets added to the list, and the AI predicts the next one. Over and over. That’s how you get a paragraph, a poem, a whole essay.
Why does it feel smart?
Because predicting the next word well requires understanding grammar, facts, logic, and context. You can’t guess the next word in “The capital of France is ___” without knowing geography. The “intelligence” is a side effect of being extremely good at this one task.
Temperature — the creativity dial
If the AI always picks the single most likely next token, you get boring, robotic output. Temperature is a knob that makes it sometimes pick the 2nd or 3rd most likely option. Higher temperature = more creative (and more weird). Low temperature = safer but repetitive.
Tutor
Curious about “How LLMs Work”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 40 min
Building a Budget-Aware Agent Planner
How to give the agent a token and dollar budget it must plan within, not just consume.
Builders · 7 min
Why a 5-Minute Claude Code Session Can Cost a Dollar
Agents loop, and every loop iteration uses tokens — that's why agentic costs add up faster than chats.
Builders · 22 min
The Mind-Boggling Scale of Modern Training Data
When we say trillions of tokens, we mean it. Let's make these numbers feel real with comparisons you can actually picture.
