Tendril

Lesson 269 of 2116

In-Context Learning

Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.

CreatorsAI Foundations~23 min readAdvancedOperationsBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

38 min16 blocks4 concepts

Learning path

The main moves in order

1Learning Without Training
2in-context learning
3few-shot
4meta-learning

Concept cluster

Terms to connect while reading

in-context learningfew-shotmeta-learningGPT-3

Sections4

Lists2

Notes4

Code1

Quotes1

Section 1

Learning Without Training

In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates. GPT-3's 2020 paper was the breakthrough: show the model three examples of a task, and it would continue the pattern for new inputs.

No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.

text

Few-shot in-context learning example:

English: The cat sat on the mat.
Pirate:  The scallywag perched on yonder rug.

English: I love pizza.
Pirate:  Arr, me heart beats for pizza pie!

English: Where is the library?
Pirate:

(the model continues the pattern it inferred)

Why it matters

No training required for new tasks
Adapts at runtime based on what you show
Works for tasks that did not exist when the model was trained
Lets non-ML users customize behavior with text alone

Check-in 1. Got it so far?

What is really happening inside

Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples as 'training data' for a short forward-pass 'optimization.' The model is doing a kind of learning — it just lives in the forward pass, not in the weights.

When ICL fails

Tasks far from pretraining distribution — few-shot cannot invent new capabilities
Examples in the prompt are noisy or inconsistent
Context window runs out before you can include enough examples
The model pattern-matches to a memorized template instead of the actual mapping

Check-in 2. Got it so far?

“Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.”
Brown et al., Language Models are Few-Shot Learners (2020)

Key terms in this lesson

Check-in 3. Got it so far?

The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “In-Context Learning”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

In-Context Learning

Learning Without Training

Why it matters

What is really happening inside

When ICL fails

Curious about “In-Context Learning”?

Keep going

In-Context Learning

Learning Without Training

Why it matters

What is really happening inside

When ICL fails

Curious about “In-Context Learning”?

Keep going