Lesson 228 of 1596
Chain-of-Thought Mechanics
Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.
Creators · AI Foundations · ~23 min read
Think Before You Speak
Chain-of-thought (CoT) prompting is the discovery that LLMs are often wildly better at multi-step problems when asked to reason aloud rather than answer in one shot. Wei et al. (2022) documented this effect across math, logic, and commonsense reasoning.
How CoT works mechanically
- 1Each token the model writes becomes part of its future context
- 2Intermediate steps give the model a 'scratchpad' for calculation
- 3Complex inferences can be decomposed into simple per-token steps
- 4The final answer is conditioned on the scratchpad, reducing shortcut errors
CoT looks verbose but consistently improves correctness on multi-step problems
Without CoT: Q: If I have 5 apples and buy 3 more, then eat 2, how many do I have? A: 6 With CoT: Q: (same) A: Let me think step by step. Start with 5 apples. Buy 3 more → 5 + 3 = 8. Eat 2 → 8 - 2 = 6. The answer is 6.Reasoning models take this further
Since 2024, models like OpenAI's o1, o3, Claude with extended thinking, and DeepSeek-R1 have been explicitly trained to reason for a long time before answering. Test-time compute is now a core axis of capability, distinct from parameter scaling.
Compare the options
| Standard CoT (prompted) | Trained reasoning model |
|---|---|
| Works at inference only | Baked into training |
| Quality depends on the prompt | Robust to prompt wording |
| Limited by context length | Long internal deliberation |
| Can be coaxed to faster | Explicitly uses more compute per query |
When CoT goes wrong
- Overconfidence: verbose reasoning does not prevent wrong answers — it can make them sound more certain
- Hallucinated steps: the model invents intermediate calculations
- Unfaithful reasoning: the stated steps do not reflect the actual decision
- Wasted tokens: simple questions do not benefit and cost more
“Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning.”
Key terms in this lesson
The big idea: making thinking visible usually makes thinking better. But visible is not the same as faithful — and that distinction matters.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Chain-of-Thought Mechanics”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
The Full Machine Learning Pipeline
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
