Chain-of-Thought Mechanics

Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.

38 min · Reviewed 2026

Think Before You Speak

Chain-of-thought (CoT) prompting is the discovery that LLMs are often wildly better at multi-step problems when asked to reason aloud rather than answer in one shot. Wei et al. (2022) documented this effect across math, logic, and commonsense reasoning.

How CoT works mechanically

Each token the model writes becomes part of its future context
Intermediate steps give the model a 'scratchpad' for calculation
Complex inferences can be decomposed into simple per-token steps
The final answer is conditioned on the scratchpad, reducing shortcut errors

Without CoT: Q: If I have 5 apples and buy 3 more, then eat 2, how many do I have? A: 6 With CoT: Q: (same) A: Let me think step by step. Start with 5 apples. Buy 3 more → 5 + 3 = 8. Eat 2 → 8 - 2 = 6. The answer is 6.CoT looks verbose but consistently improves correctness on multi-step problems

Reasoning models take this further

Since 2024, models like OpenAI's o1, o3, Claude with extended thinking, and DeepSeek-R1 have been explicitly trained to reason for a long time before answering. Test-time compute is now a core axis of capability, distinct from parameter scaling.

Standard CoT (prompted)	Trained reasoning model
Works at inference only	Baked into training
Quality depends on the prompt	Robust to prompt wording
Limited by context length	Long internal deliberation
Can be coaxed to faster	Explicitly uses more compute per query

When CoT goes wrong

Overconfidence: verbose reasoning does not prevent wrong answers — it can make them sound more certain
Hallucinated steps: the model invents intermediate calculations
Unfaithful reasoning: the stated steps do not reflect the actual decision
Wasted tokens: simple questions do not benefit and cost more

Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning.
— Wei et al., Chain-of-Thought Prompting (2022)

The big idea: making thinking visible usually makes thinking better. But visible is not the same as faithful — and that distinction matters.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-chain-of-thought

What is the main idea of "Chain-of-Thought Mechanics"?
1. Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Chain-of-Thought Mechanics"?
1. reasoning
2. chain of thought
3. scratchpad
4. CoT
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Each token the model writes becomes part of its future context
4. Treat the AI output as automatically correct
What should a careful learner remember about "Faithfulness is an open question"?
1. Use AI to draft or organize ideas about chain of thought, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about chain of thought be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about chain of thought.
Which action would help you apply "Chain-of-Thought Mechanics" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Intermediate steps give the model a 'scratchpad' for calculation

← Back to interactive lesson

Tendril · Creators · AI Foundations

Chain-of-Thought Mechanics

Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.

38 min · Reviewed 2026

Think Before You Speak

How CoT works mechanically

Each token the model writes becomes part of its future context
Intermediate steps give the model a 'scratchpad' for calculation
Complex inferences can be decomposed into simple per-token steps
The final answer is conditioned on the scratchpad, reducing shortcut errors

Without CoT: Q: If I have 5 apples and buy 3 more, then eat 2, how many do I have? A: 6 With CoT: Q: (same) A: Let me think step by step. Start with 5 apples. Buy 3 more → 5 + 3 = 8. Eat 2 → 8 - 2 = 6. The answer is 6.CoT looks verbose but consistently improves correctness on multi-step problems

Reasoning models take this further

Standard CoT (prompted)	Trained reasoning model
Works at inference only	Baked into training
Quality depends on the prompt	Robust to prompt wording
Limited by context length	Long internal deliberation
Can be coaxed to faster	Explicitly uses more compute per query

When CoT goes wrong

Overconfidence: verbose reasoning does not prevent wrong answers — it can make them sound more certain
Hallucinated steps: the model invents intermediate calculations
Unfaithful reasoning: the stated steps do not reflect the actual decision
Wasted tokens: simple questions do not benefit and cost more

Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning.
— Wei et al., Chain-of-Thought Prompting (2022)

The big idea: making thinking visible usually makes thinking better. But visible is not the same as faithful — and that distinction matters.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-chain-of-thought

What is the main idea of "Chain-of-Thought Mechanics"?
1. Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Chain-of-Thought Mechanics"?
1. reasoning
2. chain of thought
3. scratchpad
4. CoT
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Each token the model writes becomes part of its future context
4. Treat the AI output as automatically correct
What should a careful learner remember about "Faithfulness is an open question"?
1. Use AI to draft or organize ideas about chain of thought, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about chain of thought be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about chain of thought.
Which action would help you apply "Chain-of-Thought Mechanics" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Intermediate steps give the model a 'scratchpad' for calculation

← Back to interactive lesson