Lesson 269 of 2116
In-Context Learning
Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Learning Without Training
- 2in-context learning
- 3few-shot
- 4meta-learning
Concept cluster
Terms to connect while reading
Section 1
Learning Without Training
In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates. GPT-3's 2020 paper was the breakthrough: show the model three examples of a task, and it would continue the pattern for new inputs.
No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.
Few-shot in-context learning example:
English: The cat sat on the mat.
Pirate: The scallywag perched on yonder rug.
English: I love pizza.
Pirate: Arr, me heart beats for pizza pie!
English: Where is the library?
Pirate:
(the model continues the pattern it inferred)Why it matters
- No training required for new tasks
- Adapts at runtime based on what you show
- Works for tasks that did not exist when the model was trained
- Lets non-ML users customize behavior with text alone
What is really happening inside
Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples as 'training data' for a short forward-pass 'optimization.' The model is doing a kind of learning — it just lives in the forward pass, not in the weights.
When ICL fails
- Tasks far from pretraining distribution — few-shot cannot invent new capabilities
- Examples in the prompt are noisy or inconsistent
- Context window runs out before you can include enough examples
- The model pattern-matches to a memorized template instead of the actual mapping
“Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.”
Key terms in this lesson
The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “In-Context Learning”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
Creators · 28 min
ResNets and the Depth Breakthrough
A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
