Loading lesson…
In 2024, a new class of models traded fast answers for slow, deliberate thinking, and benchmarks jumped.
In September 2024, OpenAI previewed o1, a model that spent extra compute before answering, generating long internal chains of reasoning. On hard math, coding, and science benchmarks, o1 leapt past GPT-4o, sometimes by double-digit points on tests where progress had been crawling.
The core idea was not prompt-level chain of thought. It was training the model, often through reinforcement learning, to use its own thinking tokens effectively, and then letting it spend as many of those tokens as needed at inference time.
Competitors followed quickly. Google's Gemini 2.0 Flash Thinking, DeepSeek's R1 in early 2025, and Anthropic's extended thinking mode all adopted variants of the paradigm. Some published training recipes openly; others kept them secret.
We've developed a new series of AI models designed to spend more time thinking before they respond.
— OpenAI, o1 announcement, 2024
The big idea: reasoning models reopened the scaling frontier by moving compute from training time into inference time. A model that can think longer is a different kind of model.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-reasoning-models-builders
What makes OpenAI o1 different from earlier GPT models in how it produces answers?
What is 'inference-time compute'?
Why can a smaller model that thinks longer sometimes outperform a larger model that answers quickly?
What type of training method helps reasoning models learn to use their thinking tokens effectively?
On which type of benchmark did o1 show the most dramatic improvement compared to previous models?
What does the abbreviation RLVR stand for?
Why are multi-step math problems particularly challenging for AI models that don't use deliberate reasoning?
Which of these companies released reasoning model products in response to OpenAI o1?
What does it mean that reasoning models 'reopened the scaling frontier'?
How does o1's chain of thought differ from the 'chain of thought prompting' that users sometimes use with other models?
What is a key advantage of reasoning models for agentic tasks that involve multiple steps?
What is competitive programming particularly demanding for AI models?
Why did the improvements on hard math benchmarks matter particularly for demonstrating reasoning capabilities?
What did the lesson mean when it said reasoning models traded 'fast answers for slow, deliberate thinking'?
What fundamental shift did reasoning models introduce to AI scaling?