Loading lesson…
A calibrated model's 70 percent means it is right 70 percent of the time. Most LLMs are not calibrated. Here is what that costs you.
A calibrated classifier is one whose probability estimates match real frequencies. If it says 70 percent across a batch of predictions, 70 percent of those should be correct. That is an uncommon property in modern LLMs.
To check calibration, bin predictions by confidence (0-10 percent, 10-20 percent, ...). For each bin, plot average confidence vs empirical accuracy. A perfectly calibrated model hugs the diagonal. Overconfidence bends the line below; underconfidence bends it above.
Perfect calibration:
Accuracy
1.0 | *
0.8 | *
0.6 | *
0.4 | *
0.0 *--------- Confidence
0.0 0.6 1.0
Real models often look like:
Accuracy
1.0 | *
0.8 | * * (overconfident)
0.6 | *
0.4 | *
0.0 *---------
Confidence too high relative to truth.Reliability diagram: where the model thinks it is vs where it actually isECE sums the gap between confidence and accuracy across bins, weighted by bin size. Lower is better. A well-calibrated model has ECE below 0.05. A raw out-of-the-box LLM often sits at 0.15 or higher.
Modern neural networks are not calibrated — a phenomenon that has worsened as accuracy improved.
— Guo et al., On Calibration of Modern Neural Networks (2017)
The big idea: a confident answer is not a correct answer. Calibration is the bridge between the two.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-calibration
What is the core idea behind "Calibration"?
Which term best describes a foundational idea in "Calibration"?
A learner studying Calibration would need to understand which concept?
Which of these is directly relevant to Calibration?
Which of the following is a key point about Calibration?
Which of these does NOT belong in a discussion of Calibration?
Which statement is accurate regarding Calibration?
Which of these does NOT belong in a discussion of Calibration?
What is the key insight about "Overconfidence is the default" in the context of Calibration?
What is the recommended tip about "Ground your practice in fundamentals" in the context of Calibration?
Which statement accurately describes an aspect of Calibration?
What does working with Calibration typically involve?
Which of the following is true about Calibration?
Which best describes the scope of "Calibration"?
Which section heading best belongs in a lesson about Calibration?