Loading lesson…
In 2024, a new class of models traded fast answers for slow, deliberate thinking, and benchmarks jumped.
In September 2024, OpenAI previewed o1, a model that spent extra compute before answering, generating long internal chains of reasoning. On hard math, coding, and science benchmarks, o1 leapt past GPT-4o, sometimes by double-digit points on tests where progress had been crawling.
The core idea was not prompt-level chain of thought. It was training the model, often through reinforcement learning, to use its own thinking tokens effectively, and then letting it spend as many of those tokens as needed at inference time.
Competitors followed quickly. Google's Gemini 2.0 Flash Thinking, DeepSeek's R1 in early 2025, and Anthropic's extended thinking mode all adopted variants of the paradigm. Some published training recipes openly; others kept them secret.
We've developed a new series of AI models designed to spend more time thinking before they respond.
— OpenAI, o1 announcement, 2024
The big idea: reasoning models reopened the scaling frontier by moving compute from training time into inference time. A model that can think longer is a different kind of model.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-reasoning-models-builders
What is the main idea of "Reasoning Models: OpenAI o1 and After"?
Which concept is most central to "Reasoning Models: OpenAI o1 and After"?
Which use of AI fits this topic best?
What should a careful learner remember about "Two axes of scaling"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about reasoning models be treated?
Name one way to verify an AI answer about reasoning models.
Which action would help you apply "Reasoning Models: OpenAI o1 and After" responsibly?