Loading lesson…
A practical picker for current OpenAI models: when to pay for the frontier model, when to use a smaller model, and when Codex-specific models make sense.
The OpenAI model lineup changes quickly, but the decision pattern is stable: pick the smallest model that can reliably do the job, then upgrade only when evals show it is worth the latency and cost.
| Need | Start here | Why |
|---|---|---|
| Complex coding, planning, professional reasoning | gpt-5.5 | Flagship quality and strong default for hard work |
| Cost-sensitive coding or subagents | gpt-5.4-mini | Strong small model with lower latency and cost |
| Classification, extraction, ranking, simple routing | gpt-5.4-nano | Cheapest GPT-5.4-class option for high volume |
| Agentic coding inside Codex-like harnesses | gpt-5.3-codex | Optimized for coding-agent loops |
| Very hard slow analysis | pro variant or background mode | More compute, but design for waiting |
const MODEL_BY_TASK = {
hardCoding: "gpt-5.5",
routineSubagent: "gpt-5.4-mini",
extraction: "gpt-5.4-nano",
codexHarness: "gpt-5.3-codex",
} as const;Centralize model choices so migrations are one config change instead of a repo-wide hunt.The big idea: model selection is an economics problem wrapped in a quality problem. Measure both.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-openai-model-picker-creators
A developer needs to build a classification system that processes thousands of customer support emails per hour. Which model should they start with according to the model picker strategy?
What does the model picker approach recommend when your current model successfully handles a task but you want to optimize for lower operational costs?
A team is building an AI coding assistant that will autonomously refactor large codebases inside an agentic loop. Which model best matches this use case?
When establishing a baseline for model performance evaluation, what should you run first according to the picker methodology?
A developer is building a system that requires very hard slow analysis where accuracy is more important than response time. Which model variant should they consider?
What does the lesson recommend about storing model identifiers in your application code?
What key metrics should be measured during model evaluation according to the picker methodology?
A startup is building a prototype and needs strong AI capabilities but has limited budget. According to the model picker approach, what should they do first?
What is the recommended source for determining the exact current model IDs before deploying to production?
When would it make sense to upgrade from gpt-5.4-mini to gpt-5.5 for a coding task?
What type of tasks is gpt-5.4-mini specifically designed for according to the model picker guide?
A developer wants to build an eval set for model selection. What does the lesson recommend as the source for these evaluation tasks?
What happens if you write production code that assumes model names will never change?
What is the relationship between model size and the recommended selection strategy?
What does running mini and nano models help you find in the evaluation process?