Loading lesson…
Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates. GPT-3's 2020 paper was the breakthrough: show the model three examples of a task, and it would continue the pattern for new inputs.
Few-shot in-context learning example:
English: The cat sat on the mat.
Pirate: The scallywag perched on yonder rug.
English: I love pizza.
Pirate: Arr, me heart beats for pizza pie!
English: Where is the library?
Pirate:
(the model continues the pattern it inferred)No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples as 'training data' for a short forward-pass 'optimization.' The model is doing a kind of learning — it just lives in the forward pass, not in the weights.
Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
— Brown et al., Language Models are Few-Shot Learners (2020)
The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-in-context-learning
What is the core idea behind "In-Context Learning"?
Which term best describes a foundational idea in "In-Context Learning"?
A learner studying In-Context Learning would need to understand which concept?
Which of these is directly relevant to In-Context Learning?
Which of the following is a key point about In-Context Learning?
Which of these does NOT belong in a discussion of In-Context Learning?
Which statement is accurate regarding In-Context Learning?
Which of these does NOT belong in a discussion of In-Context Learning?
What is the key insight about "Induction heads" in the context of In-Context Learning?
What is the key insight about "Prompt engineering is ICL engineering" in the context of In-Context Learning?
What is the recommended tip about "Ground your practice in fundamentals" in the context of In-Context Learning?
Which statement accurately describes an aspect of In-Context Learning?
What does working with In-Context Learning typically involve?
Which of the following is true about In-Context Learning?
Which best describes the scope of "In-Context Learning"?