Lesson 219 of 1596
Uncertainty Quantification in LLMs
A model that says 'I am 95 percent sure' and is wrong 40 percent of the time is miscalibrated. Measuring that gap is uncertainty quantification.
Creators · AI Foundations · ~27 min read
How Sure Is the Model, Really?
LLMs produce a probability distribution over possible next tokens at every step. That distribution encodes how confident the model is. But a confident-sounding answer in English is not the same as the model's internal probability — and the gap between them is where uncertainty quantification lives.
Three kinds of uncertainty
- Aleatoric: noise inherent in the data (different annotators would label differently)
- Epistemic: uncertainty from the model not having seen enough
- Model: uncertainty from choice of architecture or training
Signals you can actually read
Compare the options
| Signal | What it captures | How to read |
|---|---|---|
| Token log-probabilities | Sequence probability | Low average logprob = uncertain answer |
| Entropy of next-token distribution | How spread out predictions are | High entropy at choice points = branching |
| Semantic consistency across samples | Meaning-level uncertainty | Same answer from 5 samples = confident |
| Verbalized confidence | Self-reported probability | Often miscalibrated, but easy |
Why it matters in practice
- 1Let low-confidence answers trigger a tool call or human review
- 2Abstain from answering when uncertainty is too high
- 3Surface uncertainty in the UI so users can weigh it
- 4Track calibration over time as a quality metric
“A responsible model should not just give you an answer. It should tell you how much to trust it.”
Key terms in this lesson
The big idea: confidence without calibration is noise. Quantifying uncertainty turns an LLM from a slot machine into a sensor.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Uncertainty Quantification in LLMs”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Shannon and the Birth of Information
Claude Shannon turned communication into mathematics and gave AI the substrate it would need.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
Creators · 40 min
Calibration
A calibrated model's 70 percent means it is right 70 percent of the time. Most LLMs are not calibrated. Here is what that costs you.
