Loading lesson…
R1 was the open-weights reasoning shock of early 2025. A year later it is still the default for anyone who needs o-series reasoning without paying o-series prices.
DeepSeek R1 showed that an open-weights team could ship o1-class reasoning on a shoestring. The weights are downloadable, the quality is genuine, and the pricing on DeepSeek's own API is roughly 1/20th of OpenAI o-series.
| Option | DeepSeek R1 | OpenAI high-effort reasoning | GPT-5.5 |
|---|---|---|---|
| Cost per M output | Very low | High | High |
| Latency | Slow (thinks) | Slow to moderate | Moderate |
| Open weights | Yes | No | No |
| Quality | Near-frontier on selected reasoning tasks | Frontier | Frontier |
resp = client.chat.completions.create(
model="deepseek-reasoner",
messages=[{"role": "user", "content": hard_problem}],
)
# response includes reasoning_content + contentThe API returns thinking and final answer separately.Frontier competition math, novel scientific reasoning, and any benchmark where the last 3 points of accuracy matter. For everyday hard problems, R1 is enough.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-modelx-deepseek-r1-reasoning-builders
What makes DeepSeek R1 unusual compared to most commercial AI assistants?
What does it mean that DeepSeek R1 has 'open weights'?
What is 'chain-of-thought' reasoning in AI models?
How does DeepSeek R1's cost compare to OpenAI's o-series models?
What is 'distillation' in the context of AI models?
Why does DeepSeek R1 often have slower response times than simpler AI models?
What is the main advantage of R1-Distill-Llama-70B over the full R1 model?
What hardware can run R1-Distill-Llama-70B?
In what way is DeepSeek R1's quality described relative to frontier models?
When should someone still pay for high-effort GPT models instead of using R1?
What is the main tradeoff when choosing between DeepSeek R1 and a frontier model like OpenAI's o-series?
Why might a startup choose DeepSeek R1 over OpenAI's reasoning models?
What does the lesson imply about the future of open-weights reasoning models?
What is required to achieve the 'realistic self-host target' mentioned in the lesson?
What makes R1 different from a model like GPT-5.5 in terms of accessibility?