Loading lesson…
Asking 'can the model do it?' and 'will doing it cause harm?' are different questions. Both matter.
Capability evaluations measure what a model can do at its best. Safety evaluations measure what it will do in adversarial or risky conditions. They use different tools, different mindsets, and different success criteria.
| Capability eval | Safety eval |
|---|---|
| Measures peak skill | Measures behavior under pressure |
| Goal: higher score is better | Goal: no harm, even under attack |
| Public benchmarks | Often private, adversarial, red-teamed |
| Single-shot or best-of-N | Rare, worst-case outcomes matter most |
| Example: MMLU, GPQA, SWE-bench | Example: ToxiGen, cyberattack uplift, CBRN probes |
A model can score 95 percent on MMLU and still produce harmful outputs in 2 percent of real conversations. Average performance is a bad summary when catastrophic failures are possible.
You need a model that is smart enough to be useful and wise enough to be safe. Neither alone is sufficient.
— A senior safety researcher at a frontier lab
The big idea: capability eval asks 'how smart?' Safety eval asks 'how trustworthy?' Both must climb together, or we have a problem.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-capability-vs-safety-eval
What is the core idea behind "Capability Evaluation vs. Safety Evaluation"?
Which term best describes a foundational idea in "Capability Evaluation vs. Safety Evaluation"?
A learner studying Capability Evaluation vs. Safety Evaluation would need to understand which concept?
Which of these is directly relevant to Capability Evaluation vs. Safety Evaluation?
Which of the following is a key point about Capability Evaluation vs. Safety Evaluation?
Which of these does NOT belong in a discussion of Capability Evaluation vs. Safety Evaluation?
Which statement is accurate regarding Capability Evaluation vs. Safety Evaluation?
Which of these does NOT belong in a discussion of Capability Evaluation vs. Safety Evaluation?
What is the key insight about "Responsible Scaling Policies" in the context of Capability Evaluation vs. Safety Evaluation?
What is the key insight about "Worst case matters" in the context of Capability Evaluation vs. Safety Evaluation?
What is the recommended tip about "Ground your practice in fundamentals" in the context of Capability Evaluation vs. Safety Evaluation?
Which statement accurately describes an aspect of Capability Evaluation vs. Safety Evaluation?
What does working with Capability Evaluation vs. Safety Evaluation typically involve?
Which of the following is true about Capability Evaluation vs. Safety Evaluation?
Which best describes the scope of "Capability Evaluation vs. Safety Evaluation"?