How LLMs Work

Tokens, probabilities, training, and why next-word prediction feels smart.

BuildersBuilders~21 min readInteractiveBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Every time you chat with Claude or ChatGPT, something weird is happening under the hood: the AI is literally predicting the next word, one at a time, as fast as possible. That’s it. That’s the whole trick.

Step 1 — Words become numbers

The first thing an LLM does is chop your message into tokens. A token is usually a word or a piece of a word. “Unbelievable” might become un + believ + able. Each token becomes a number the computer can work with.

Step 2 — Predict the next token

The AI asks: given all those tokens, what’s the most likely next one? It computes a probability for every possibletoken in its vocabulary — usually around 100,000 options — and picks one.

Step 3 — Add it. Repeat.

The new token gets added to the list, and the AI predicts the next one. Over and over. That’s how you get a paragraph, a poem, a whole essay.

Why does it feel smart?

Because predicting the next word well requires understanding grammar, facts, logic, and context. You can’t guess the next word in “The capital of France is ___” without knowing geography. The “intelligence” is a side effect of being extremely good at this one task.

Temperature — the creativity dial

If the AI always picks the single most likely next token, you get boring, robotic output. Temperature is a knob that makes it sometimes pick the 2nd or 3rd most likely option. Higher temperature = more creative (and more weird). Low temperature = safer but repetitive.

Tutor

Curious about “How LLMs Work”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going

Builders0%

Standalone lesson.

Lesson 1562 of 1570

How LLMs Work

Tokens, probabilities, training, and why next-word prediction feels smart.

BuildersBuilders~21 min readInteractiveBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Step 1 — Words become numbers

Step 2 — Predict the next token

The AI asks: given all those tokens, what’s the most likely next one? It computes a probability for every possibletoken in its vocabulary — usually around 100,000 options — and picks one.

Step 3 — Add it. Repeat.

The new token gets added to the list, and the AI predicts the next one. Over and over. That’s how you get a paragraph, a poem, a whole essay.

Why does it feel smart?

Temperature — the creativity dial

Tutor

Curious about “How LLMs Work”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons