In-Context Learning

Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.

38 min · Reviewed 2026

Learning Without Training

In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates. GPT-3's 2020 paper was the breakthrough: show the model three examples of a task, and it would continue the pattern for new inputs.

Few-shot in-context learning example:

English: The cat sat on the mat.
Pirate:  The scallywag perched on yonder rug.

English: I love pizza.
Pirate:  Arr, me heart beats for pizza pie!

English: Where is the library?
Pirate:

(the model continues the pattern it inferred)No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.

Why it matters

No training required for new tasks
Adapts at runtime based on what you show
Works for tasks that did not exist when the model was trained
Lets non-ML users customize behavior with text alone

What is really happening inside

Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples as 'training data' for a short forward-pass 'optimization.' The model is doing a kind of learning — it just lives in the forward pass, not in the weights.

When ICL fails

Tasks far from pretraining distribution — few-shot cannot invent new capabilities
Examples in the prompt are noisy or inconsistent
Context window runs out before you can include enough examples
The model pattern-matches to a memorized template instead of the actual mapping

Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
— Brown et al., Language Models are Few-Shot Learners (2020)

The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-in-context-learning

What is the core idea behind "In-Context Learning"?
1. Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which term best describes a foundational idea in "In-Context Learning"?
1. few-shot
2. in-context learning
3. induction head
4. meta-learning
A learner studying In-Context Learning would need to understand which concept?
1. in-context learning
2. induction head
3. few-shot
4. meta-learning
Which of these is directly relevant to In-Context Learning?
1. in-context learning
2. few-shot
3. meta-learning
4. induction head
Which of the following is a key point about In-Context Learning?
1. No training required for new tasks
2. Adapts at runtime based on what you show
3. Works for tasks that did not exist when the model was trained
4. Lets non-ML users customize behavior with text alone
Which of these does NOT belong in a discussion of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. No training required for new tasks
3. Adapts at runtime based on what you show
4. Works for tasks that did not exist when the model was trained
Which statement is accurate regarding In-Context Learning?
1. Examples in the prompt are noisy or inconsistent
2. Context window runs out before you can include enough examples
3. Tasks far from pretraining distribution — few-shot cannot invent new capabilities
4. The model pattern-matches to a memorized template instead of the actual mapping
Which of these does NOT belong in a discussion of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Context window runs out before you can include enough examples
3. Examples in the prompt are noisy or inconsistent
4. Tasks far from pretraining distribution — few-shot cannot invent new capabilities
What is the key insight about "Induction heads" in the context of In-Context Learning?
1. Early transformer studies found circuits called 'induction heads' that spot repeated patterns in the context and copy th…
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
What is the key insight about "Prompt engineering is ICL engineering" in the context of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. When you reshuffle few-shot examples and get better answers, you are tuning the model's implicit in-context optimization.
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
What is the recommended tip about "Ground your practice in fundamentals" in the context of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Score responses by a harms rubric
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. Apollo Research: focuses on deception and scheming
Which statement accurately describes an aspect of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Score responses by a harms rubric
3. Apollo Research: focuses on deception and scheming
4. In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates.
What does working with In-Context Learning typically involve?
1. Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples …
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which of the following is true about In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which best describes the scope of "In-Context Learning"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Show a model three examples, and it learns the task on the spot — without any weight updates. This i
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson

Tendril · Creators · AI Foundations

In-Context Learning

Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.

38 min · Reviewed 2026

Learning Without Training

Few-shot in-context learning example:

English: The cat sat on the mat.
Pirate:  The scallywag perched on yonder rug.

English: I love pizza.
Pirate:  Arr, me heart beats for pizza pie!

English: Where is the library?
Pirate:

(the model continues the pattern it inferred)No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.

Why it matters

No training required for new tasks
Adapts at runtime based on what you show
Works for tasks that did not exist when the model was trained
Lets non-ML users customize behavior with text alone

What is really happening inside

When ICL fails

Tasks far from pretraining distribution — few-shot cannot invent new capabilities
Examples in the prompt are noisy or inconsistent
Context window runs out before you can include enough examples
The model pattern-matches to a memorized template instead of the actual mapping

Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
— Brown et al., Language Models are Few-Shot Learners (2020)

The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-in-context-learning

What is the core idea behind "In-Context Learning"?
1. Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which term best describes a foundational idea in "In-Context Learning"?
1. few-shot
2. in-context learning
3. induction head
4. meta-learning
A learner studying In-Context Learning would need to understand which concept?
1. in-context learning
2. induction head
3. few-shot
4. meta-learning
Which of these is directly relevant to In-Context Learning?
1. in-context learning
2. few-shot
3. meta-learning
4. induction head
Which of the following is a key point about In-Context Learning?
1. No training required for new tasks
2. Adapts at runtime based on what you show
3. Works for tasks that did not exist when the model was trained
4. Lets non-ML users customize behavior with text alone
Which of these does NOT belong in a discussion of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. No training required for new tasks
3. Adapts at runtime based on what you show
4. Works for tasks that did not exist when the model was trained
Which statement is accurate regarding In-Context Learning?
1. Examples in the prompt are noisy or inconsistent
2. Context window runs out before you can include enough examples
3. Tasks far from pretraining distribution — few-shot cannot invent new capabilities
4. The model pattern-matches to a memorized template instead of the actual mapping
Which of these does NOT belong in a discussion of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Context window runs out before you can include enough examples
3. Examples in the prompt are noisy or inconsistent
4. Tasks far from pretraining distribution — few-shot cannot invent new capabilities
What is the key insight about "Induction heads" in the context of In-Context Learning?
1. Early transformer studies found circuits called 'induction heads' that spot repeated patterns in the context and copy th…
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
What is the key insight about "Prompt engineering is ICL engineering" in the context of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. When you reshuffle few-shot examples and get better answers, you are tuning the model's implicit in-context optimization.
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
What is the recommended tip about "Ground your practice in fundamentals" in the context of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Score responses by a harms rubric
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. Apollo Research: focuses on deception and scheming
Which statement accurately describes an aspect of In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. Score responses by a harms rubric
3. Apollo Research: focuses on deception and scheming
4. In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates.
What does working with In-Context Learning typically involve?
1. Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples …
2. A capable but misaligned model is dangerous by design
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which of the following is true about In-Context Learning?
1. A capable but misaligned model is dangerous by design
2. The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.
3. Score responses by a harms rubric
4. Apollo Research: focuses on deception and scheming
Which best describes the scope of "In-Context Learning"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Show a model three examples, and it learns the task on the spot — without any weight updates. This i
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson