Loading lesson…
Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
In-context learning (ICL) is a model's ability to acquire a new task from examples shown in its prompt, without any gradient updates. GPT-3's 2020 paper was the breakthrough: show the model three examples of a task, and it would continue the pattern for new inputs.
Few-shot in-context learning example: English: The cat sat on the mat. Pirate: The scallywag perched on yonder rug. English: I love pizza. Pirate: Arr, me heart beats for pizza pie! English: Where is the library? Pirate: (the model continues the pattern it inferred)No fine-tuning. The model learns the 'English-to-pirate' mapping from the examples in the prompt.Research has shown that transformers can implement something like gradient descent in their attention layers, using the in-context examples as 'training data' for a short forward-pass 'optimization.' The model is doing a kind of learning — it just lives in the forward pass, not in the weights.
Our largest model, GPT-3, with 175 billion parameters, achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
— Brown et al., Language Models are Few-Shot Learners (2020)
The big idea: modern LLMs learn inside a single forward pass. Understanding ICL reshapes how you think about what 'training' even means.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-in-context-learning
What is the main idea of "In-Context Learning"?
Which concept is most central to "In-Context Learning"?
Which use of AI fits this topic best?
What should a careful learner remember about "Induction heads"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about in-context learning be treated?
Name one way to verify an AI answer about in-context learning.
Which action would help you apply "In-Context Learning" responsibly?