Lesson 270 of 2116
Chain-of-Thought Mechanics
Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Think Before You Speak
- 2chain of thought
- 3reasoning
- 4scratchpad
Concept cluster
Terms to connect while reading
Section 1
Think Before You Speak
Chain-of-thought (CoT) prompting is the discovery that LLMs are often wildly better at multi-step problems when asked to reason aloud rather than answer in one shot. Wei et al. (2022) documented this effect across math, logic, and commonsense reasoning.
How CoT works mechanically
- 1Each token the model writes becomes part of its future context
- 2Intermediate steps give the model a 'scratchpad' for calculation
- 3Complex inferences can be decomposed into simple per-token steps
- 4The final answer is conditioned on the scratchpad, reducing shortcut errors
CoT looks verbose but consistently improves correctness on multi-step problems
Without CoT:
Q: If I have 5 apples and buy 3 more,
then eat 2, how many do I have?
A: 6
With CoT:
Q: (same)
A: Let me think step by step.
Start with 5 apples. Buy 3 more → 5 + 3 = 8.
Eat 2 → 8 - 2 = 6. The answer is 6.Reasoning models take this further
Since 2024, models like OpenAI's o1, o3, Claude with extended thinking, and DeepSeek-R1 have been explicitly trained to reason for a long time before answering. Test-time compute is now a core axis of capability, distinct from parameter scaling.
Compare the options
| Standard CoT (prompted) | Trained reasoning model |
|---|---|
| Works at inference only | Baked into training |
| Quality depends on the prompt | Robust to prompt wording |
| Limited by context length | Long internal deliberation |
| Can be coaxed to faster | Explicitly uses more compute per query |
When CoT goes wrong
- Overconfidence: verbose reasoning does not prevent wrong answers — it can make them sound more certain
- Hallucinated steps: the model invents intermediate calculations
- Unfaithful reasoning: the stated steps do not reflect the actual decision
- Wasted tokens: simple questions do not benefit and cost more
“Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning.”
Key terms in this lesson
The big idea: making thinking visible usually makes thinking better. But visible is not the same as faithful — and that distinction matters.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Chain-of-Thought Mechanics”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
The Full Machine Learning Pipeline
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
