Tendril

Lesson 270 of 2116

Chain-of-Thought Mechanics

Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.

CreatorsAI Foundations~23 min readAdvancedBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

38 min16 blocks4 concepts

Learning path

The main moves in order

1Think Before You Speak
2chain of thought
3reasoning
4scratchpad

Concept cluster

Terms to connect while reading

chain of thoughtreasoningscratchpadCoT

Sections4

Lists2

Notes3

Code1

Compare1

Section 1

Think Before You Speak

Chain-of-thought (CoT) prompting is the discovery that LLMs are often wildly better at multi-step problems when asked to reason aloud rather than answer in one shot. Wei et al. (2022) documented this effect across math, logic, and commonsense reasoning.

How CoT works mechanically

1Each token the model writes becomes part of its future context
2Intermediate steps give the model a 'scratchpad' for calculation
3Complex inferences can be decomposed into simple per-token steps
4The final answer is conditioned on the scratchpad, reducing shortcut errors

CoT looks verbose but consistently improves correctness on multi-step problems

text

Without CoT:

Q: If I have 5 apples and buy 3 more,
   then eat 2, how many do I have?
A: 6

With CoT:

Q: (same)
A: Let me think step by step.
   Start with 5 apples. Buy 3 more → 5 + 3 = 8.
   Eat 2 → 8 - 2 = 6. The answer is 6.

Check-in 1. Got it so far?

Reasoning models take this further

Since 2024, models like OpenAI's o1, o3, Claude with extended thinking, and DeepSeek-R1 have been explicitly trained to reason for a long time before answering. Test-time compute is now a core axis of capability, distinct from parameter scaling.

Compare the options

Standard CoT (prompted)	Trained reasoning model
Works at inference only	Baked into training
Quality depends on the prompt	Robust to prompt wording
Limited by context length	Long internal deliberation
Can be coaxed to faster	Explicitly uses more compute per query

When CoT goes wrong

Overconfidence: verbose reasoning does not prevent wrong answers — it can make them sound more certain
Hallucinated steps: the model invents intermediate calculations
Unfaithful reasoning: the stated steps do not reflect the actual decision
Wasted tokens: simple questions do not benefit and cost more

Check-in 2. Got it so far?

“Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning.”
Wei et al., Chain-of-Thought Prompting (2022)

Key terms in this lesson

Check-in 3. Got it so far?

The big idea: making thinking visible usually makes thinking better. But visible is not the same as faithful — and that distinction matters.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Chain-of-Thought Mechanics”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Chain-of-Thought Mechanics

Think Before You Speak

How CoT works mechanically

Reasoning models take this further

When CoT goes wrong

Curious about “Chain-of-Thought Mechanics”?

Keep going

Chain-of-Thought Mechanics

Think Before You Speak

How CoT works mechanically

Reasoning models take this further

When CoT goes wrong

Curious about “Chain-of-Thought Mechanics”?

Keep going