Emergence vs. Scaling

Some capabilities grow smoothly with scale. Others seem to appear out of nowhere. Telling them apart is a whole research program. The Big Question Is AI capability a smooth climb or a staircase?

40 min · Reviewed 2026

The Big Question

Is AI capability a smooth climb or a staircase? The answer is probably 'both, depending on how you measure.' Understanding the argument is central to forecasting what the next generation of models will and will not do.

The emergence camp

Wei et al. (2022) catalogued capabilities that appeared to 'emerge' at particular scales — arithmetic, instruction following, in-context learning. Below a threshold, performance was near random; above it, performance jumped sharply.

The mirage counter-argument

Schaeffer, Miranda, and Koyejo (2023) argued that many emergent abilities are a function of the metric, not the model. Switch from strict exact-match to partial-credit scoring, and the cliff becomes a gentle hill. Emergence might be about how we look, not what is there.

View	Claim	Implication
Strong emergence	Capabilities really do appear at thresholds	Forecasting is hard; surprises are inevitable
Mirage view	Smoothness is hidden by harsh metrics	Forecasting is possible with better metrics
Middle ground	Some emergence is real, some is measurement	Depends on task — check both framings

Implications for evals

Report both strict and partial-credit scores when possible
Sample densely around suspected transition points (compute, parameters)
Use continuous metrics (log-likelihood) alongside discrete (accuracy)
Probe for capability before release, not after scale-up

Our findings suggest that existing claims of emergent abilities are creations of the researcher's choice of metrics.
— Schaeffer et al., Are Emergent Abilities of Large Language Models a Mirage? (2023)

The big idea: whether AI capabilities emerge suddenly or grow smoothly depends partly on how you look. Either way, the surprises are real enough to plan for.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-emergence-vs-scaling

What is the main idea of "Emergence vs. Scaling"?
1. Some capabilities grow smoothly with scale.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Emergence vs. Scaling"?
1. scaling
2. emergence
3. phase transition
4. metric effects
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Report both strict and partial-credit scores when possible
4. Treat the AI output as automatically correct
What should a careful learner remember about "Why this is not settled"?
1. Use "Why this is not settled" as a reminder to verify the AI output before anyone relies on it.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about emergence be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about emergence.
Which action would help you apply "Emergence vs. Scaling" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Sample densely around suspected transition points (compute, parameters)

← Back to interactive lesson

Tendril · Creators · AI Foundations

Emergence vs. Scaling

Some capabilities grow smoothly with scale. Others seem to appear out of nowhere. Telling them apart is a whole research program. The Big Question Is AI capability a smooth climb or a staircase?

40 min · Reviewed 2026

The Big Question

The emergence camp

The mirage counter-argument

View	Claim	Implication
Strong emergence	Capabilities really do appear at thresholds	Forecasting is hard; surprises are inevitable
Mirage view	Smoothness is hidden by harsh metrics	Forecasting is possible with better metrics
Middle ground	Some emergence is real, some is measurement	Depends on task — check both framings

Implications for evals

Report both strict and partial-credit scores when possible
Sample densely around suspected transition points (compute, parameters)
Use continuous metrics (log-likelihood) alongside discrete (accuracy)
Probe for capability before release, not after scale-up

Our findings suggest that existing claims of emergent abilities are creations of the researcher's choice of metrics.
— Schaeffer et al., Are Emergent Abilities of Large Language Models a Mirage? (2023)

The big idea: whether AI capabilities emerge suddenly or grow smoothly depends partly on how you look. Either way, the surprises are real enough to plan for.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-emergence-vs-scaling

What is the main idea of "Emergence vs. Scaling"?
1. Some capabilities grow smoothly with scale.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Emergence vs. Scaling"?
1. scaling
2. emergence
3. phase transition
4. metric effects
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Report both strict and partial-credit scores when possible
4. Treat the AI output as automatically correct
What should a careful learner remember about "Why this is not settled"?
1. Use "Why this is not settled" as a reminder to verify the AI output before anyone relies on it.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about emergence be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about emergence.
Which action would help you apply "Emergence vs. Scaling" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Sample densely around suspected transition points (compute, parameters)

← Back to interactive lesson