OpenAI's o3, Claude with extended thinking, and DeepSeek-R1 actually pause and reason before answering. Slower, smarter, pricier.
7 min · Reviewed 2026
The big idea
Reasoning models spend extra compute 'thinking' (generating internal reasoning tokens you may or may not see) before producing a final answer. They crush math, hard logic, and code — but cost more and take longer. Use them for problems where being right matters more than being fast.
Some examples
A tough math contest problem → o3 or Claude extended thinking.
A Codeforces-level coding challenge → DeepSeek-R1 or o3.
A research synthesis comparing 20 papers → reasoning model with citations.
A simple email reply → don't waste a reasoning model on it.
Try it!
Find a hard problem (logic puzzle, contest math). Try Claude or ChatGPT in normal mode and reasoning mode. Compare.
End-of-lesson check
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-models-reasoning-models-r8a8-teen
What do reasoning models do before generating their final answer?
They ask the user clarifying questions first
They spend extra compute time generating internal reasoning tokens
They search the internet for the correct answer
They simply respond faster than regular models
A student needs help writing a quick reply to a friend's text message. Which approach would be most appropriate?
Use a regular (non-reasoning) AI model
Wait for a reasoning model to finish its extended thinking
Use a reasoning model like o3 for accuracy
Use DeepSeek-R1 for faster processing
Which of the following problems would benefit MOST from using a reasoning model?
Translating a sentence from English to Spanish
Writing a friendly birthday greeting
Summarizing a paragraph into three bullet points
A complex math competition problem requiring multi-step logic
What is 'extended thinking' a feature of?
Claude only
ChatGPT only
DeepSeek-R1 only
Both Claude and other reasoning models
A developer is working on a Codeforces-level coding challenge. Which model would be particularly suitable for this task?
DeepSeek-R1 or o3
A speech recognition model
A text-to-image generator
A simple chatbot for customer service
What is the main trade-off when using a reasoning model?
Faster responses but lower accuracy
Cheaper cost but less capable reasoning
Better for images but worse for text
Higher accuracy but higher cost and longer response time
Which company developed the o3 reasoning model?
OpenAI
Meta
Anthropic
Google
What type of task would be a waste of a reasoning model's capabilities?
Complex code debugging
A simple email reply to a colleague
Research synthesis comparing 20 academic papers
A tough logic puzzle
What makes reasoning models particularly good at math and logic problems?
They are trained only on math data
They can spend extra compute on step-by-step reasoning before answering
They randomly guess until they get the right answer
They have access to a calculator built-in
Why might a reasoning model take longer to respond than a regular model?
The internet connection is slower
The servers are overloaded
The model is generating additional reasoning tokens before answering
The model is waiting for user input
What is DeepSeek-R1 an example of?
A regular chatbot model
An image generation model
A reasoning model
A speech-to-text model
When would it make sense to use a reasoning model for research synthesis?
When synthesizing findings from 20 papers with citations
When translating a paper to another language
When looking up a single fact
When comparing exactly 2 papers
What should guide your decision to use a reasoning model?
Use them only for creative writing tasks
Only use them when the answer matters more than speed
Never use them because they are too expensive
Always use them because they are the newest technology
What happens to the internal reasoning tokens that reasoning models generate?
They are deleted immediately after use
They are never shown to the user
They may or may not be visible to the user
They are always shown to the user
What is a key reason NOT to use reasoning models for every task?