Lesson 268 of 2116
Transfer Learning
Models trained on one task can often do many others. Understanding why is one of the deepest lessons in modern ML.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Knowledge That Moves
- 2transfer learning
- 3pretraining
- 4fine-tuning
Concept cluster
Terms to connect while reading
Section 1
Knowledge That Moves
Transfer learning is the phenomenon where a model trained on task A gets a head-start on task B. It is the entire engine of the pretrain-then-finetune paradigm that made LLMs possible.
Why it works
- Large-scale pretraining builds general-purpose features
- Early layers learn low-level patterns shared across tasks
- Later layers learn task-specific routing
- Fine-tuning re-shapes the final behavior without destroying the base
The pretrain / fine-tune pipeline
- 1Pretrain on a huge, diverse corpus (next-token prediction)
- 2Fine-tune on a smaller, curated dataset for the target task
- 3Optionally apply RLHF or DPO for behavior shaping
- 4Deploy with task-specific prompts
Compare the options
| Before transfer learning | After (modern LLMs) |
|---|---|
| Train from scratch per task | Pretrain once, adapt per task |
| Need tons of labeled data | Need hundreds of labeled examples |
| Weeks per task | Hours per task |
| Poor on rare tasks | Good even on novel prompts |
Zero-shot and few-shot as transfer
The most striking form of transfer: modern LLMs can do tasks they were never explicitly trained on, just by being asked. Zero-shot (just instructions) and few-shot (instructions + examples) are transfer without any weight updates at all.
“Pretraining plus fine-tuning is the single most successful pattern in modern machine learning.”
Key terms in this lesson
The big idea: modern AI is a miracle of reuse. A single giant model, trained once, powers a thousand applications through transfer.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Transfer Learning”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
The Full Machine Learning Pipeline
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
