Tendril

Lesson 1316 of 1570

How an AI Model Actually Gets 'Trained' (No Math)

'Training data,' 'fine-tuning,' 'RLHF' — the words sound mysterious. The actual process is three clear stages.

BuildersAI Foundations~16 min readBI1 · PerceptionBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

27 min21 blocks4 concepts

Learning path

The main moves in order

1The big idea
2Pretraining vs Fine-tuning — Why It Matters
3Pretraining vs Fine-tuning — Why It Matters

Concept cluster

Terms to connect while reading

pretrainingfine-tuningRLHFtraining data

Sections6

Lists2

Notes8

Terms1

Section 1

The big idea

Modern AI models go through 3 stages: (1) Pretraining — read trillions of words of internet text; (2) Fine-tuning — read carefully curated examples of 'good' responses; (3) RLHF (Reinforcement Learning from Human Feedback) — humans rate pairs of responses and the model learns which kind people prefer. Each stage costs more than the last per data point but uses way less data.

Some examples

GPT-4 pretraining used roughly 13 trillion tokens — more than every book ever published, plus most of the internet.
Fine-tuning uses maybe 100,000 hand-written 'this is how to respond well' examples.
RLHF uses ~1 million human comparisons of 'response A is better than response B.'
Constitutional AI (Anthropic's approach) replaces some human ratings with the model rating itself against a written 'constitution' of values.

Check-in 1. Got it so far?

Try it!

Read Anthropic's Constitutional AI paper summary on their website (no math, just plain English about how they trained Claude). It takes 15 minutes and you'll understand more about modern AI than 95% of people.

Check-in 2. Got it so far?

Section 2

Pretraining vs Fine-tuning — Why It Matters

Section 3

Pretraining vs Fine-tuning — Why It Matters

Two stages built every modern AI. Knowing the difference helps you understand why models behave so differently.

What to actually do

Pretraining is months of compute, billions of dollars, all the text on the web
Fine-tuning is the part that makes it polite, helpful, and refuse certain stuff
RLHF is fine-tuning where humans rank responses to teach preferences

Check-in 3. Got it so far?

The big idea: Two stages make AI: read everything, then learn how to behave. Both stages matter.

Check-in 4. Got it so far?

Key terms in this lesson

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “How an AI Model Actually Gets 'Trained' (No Math)”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

How an AI Model Actually Gets 'Trained' (No Math)

The big idea

Some examples

Try it!

Pretraining vs Fine-tuning — Why It Matters

Pretraining vs Fine-tuning — Why It Matters

What to actually do

Curious about “How an AI Model Actually Gets 'Trained' (No Math)”?

Keep going

How an AI Model Actually Gets 'Trained' (No Math)

The big idea

Some examples

Try it!

Pretraining vs Fine-tuning — Why It Matters

Pretraining vs Fine-tuning — Why It Matters

What to actually do

Curious about “How an AI Model Actually Gets 'Trained' (No Math)”?

Keep going