Lesson 226 of 1596
Transfer Learning
Models trained on one task can often do many others. Understanding why is one of the deepest lessons in modern ML.
Creators · AI Foundations · ~21 min read
Knowledge That Moves
Transfer learning is the phenomenon where a model trained on task A gets a head-start on task B. It is the entire engine of the pretrain-then-finetune paradigm that made LLMs possible.
Why it works
- Large-scale pretraining builds general-purpose features
- Early layers learn low-level patterns shared across tasks
- Later layers learn task-specific routing
- Fine-tuning re-shapes the final behavior without destroying the base
The pretrain / fine-tune pipeline
- 1Pretrain on a huge, diverse corpus (next-token prediction)
- 2Fine-tune on a smaller, curated dataset for the target task
- 3Optionally apply RLHF or DPO for behavior shaping
- 4Deploy with task-specific prompts
Compare the options
| Before transfer learning | After (modern LLMs) |
|---|---|
| Train from scratch per task | Pretrain once, adapt per task |
| Need tons of labeled data | Need hundreds of labeled examples |
| Weeks per task | Hours per task |
| Poor on rare tasks | Good even on novel prompts |
Zero-shot and few-shot as transfer
The most striking form of transfer: modern LLMs can do tasks they were never explicitly trained on, just by being asked. Zero-shot (just instructions) and few-shot (instructions + examples) are transfer without any weight updates at all.
“Pretraining plus fine-tuning is the single most successful pattern in modern machine learning.”
Key terms in this lesson
The big idea: modern AI is a miracle of reuse. A single giant model, trained once, powers a thousand applications through transfer.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Transfer Learning”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
The Full Machine Learning Pipeline
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
