ChatGPT 'Plus' is $20/month for you. The math behind that price — and why prices keep dropping — explains a lot about the industry.
7 min · Reviewed 2026
The big idea
Running a frontier AI model uses thousands of dollars worth of NVIDIA H100 GPUs per minute. The cost per query has dropped ~99% from 2022 to 2026 because of better hardware, smaller specialized models, and engineering tricks like quantization. That price drop is why AI features keep getting added to free tiers.
Some examples
An H100 GPU rents for $2-$4/hour on cloud services; training GPT-4 reportedly cost $100M+ in compute.
GPT-4 cost $30 per million input tokens at launch (March 2023). GPT-4o-mini in 2026 costs $0.15 per million — a 200x drop.
DeepSeek's V3 model in late 2024 trained for under $6M, partly by using cleverer methods on cheaper hardware.
Inference (running the model after training) is cheaper than training but happens billions of times per day across all users.
Try it!
Visit openai.com/api/pricing or anthropic.com/pricing. Compare the same model from a year ago to today. The drop is real and steady. That's the trend that makes 'AI features in everything' possible.
End-of-lesson check
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-foundations-ai-cost-of-running-models-r9a10-teen
What does the term 'inference' refer to in AI systems?
The process of designing the architecture of a neural network
The method of measuring how accurately a model performs on test data
The process of collecting and organizing training data before building a model
The process of running a trained model to generate outputs for users
A company adds AI writing assistance to their free tier product. Based on the trends in the lesson, what is the most likely reason they can afford to do this?
Government regulations now require AI features to be free
Companies have found a way to make users pay for the electricity instead
Free users don't actually trigger AI responses — they get pre-written answers
The cost of running AI queries has dropped by approximately 99% since 2022
What is 'quantization' in the context of AI model optimization?
A method for converting text data into numerical format the model can process
A system for measuring how much compute a model uses
A technique that reduces the precision of numbers used in the model to make it smaller and faster
A process that increases the amount of training data to improve accuracy
Which combination of factors contributed to the 99% drop in AI query costs from 2022 to 2026?
Fewer users and reduced demand for AI services
Government subsidies and tax breaks for AI companies
Lower electricity prices and reduced data center rents
Better hardware, smaller specialized models, and engineering optimizations like quantization
The lesson mentions DeepSeek V3 trained for under $6 million in late 2024. What was a key reason they achieved this relatively low cost?
They received government funding to offset the costs
They used more GPUs than typical for that budget
They used cleverer methods on cheaper hardware
They only trained a very small model with minimal capabilities
If an H100 GPU rents for $3 per hour on a cloud service, approximately how much would it cost to run 100 such GPUs continuously for one full day?
About $300
About $72
About $7,200
About $3,000
What does the term 'scaling' typically refer to in AI development?
Limiting the number of users who can access the AI
Making the text font larger in the user interface
Reducing the number of features in a product
Increasing the size of models, data, and compute to improve capabilities
In the AI industry, what does 'compute' refer to?
The amount of storage space used to save model files
The physical computer a user types on to interact with AI
The mathematical calculations performed by GPUs to run AI models
The network bandwidth required to send prompts to AI servers
Why does the lesson advise builders to track per-call AI costs early in a project?
To impress investors with detailed expense reports
Because costs drop quickly — in 6 months the same capability costs about one-quarter of current prices
Because AI companies will charge more if you don't track usage
To determine which employees should have access to the AI
What is a GPU and why is it important for AI?
A Global Positioning Unit; it helps AI models understand location context
A Ground Processing Unit; it processes data sent from satellites to AI systems
A Graphics Processing Unit; it can perform many simple calculations simultaneously, making it ideal for AI work
A General Processing Unit; it controls all computer functions including AI
The lesson states that DeepSeek V3 achieved lower training costs than typical frontier models. What does this suggest about the future of AI development?
AI development will only work with massive budgets like the $100M+ for GPT-4
All future AI models will be free to train
Innovative training methods can reduce costs without sacrificing capability
GPU costs will no longer matter for AI development
Why might a company choose to use a smaller specialized model instead of a large frontier model?
Smaller models can run without any computer hardware
Smaller models are always more accurate than larger ones
Smaller models require no training data
Smaller models cost less to run and can be optimized for specific tasks
The lesson notes that inference happens 'billillions of times per day across all users.' Why is this fact significant for AI companies?
It means companies should charge users per inference call
It proves that AI is only used by a small number of people
It means inference is no longer a concern for AI companies
Even small per-query costs add up to enormous total expenses when multiplied by billions of calls
Based on the pricing trends described in the lesson, what pattern do you observe in AI model costs over time?
Costs have remained roughly stable since 2022
Costs are increasing as models get more capable
Costs are steadily decreasing due to hardware improvements and optimization techniques
Costs only change when companies decide to raise prices
How do improvements in hardware (like newer GPUs) reduce AI operating costs?
By allowing more calculations to be performed per dollar spent on electricity and hardware
By making AI models require less training data
By reducing the need for any software optimization