Loading lesson…
'Training data,' 'fine-tuning,' 'RLHF' — the words sound mysterious. The actual process is three clear stages.
Modern AI models go through 3 stages: (1) Pretraining — read trillions of words of internet text; (2) Fine-tuning — read carefully curated examples of 'good' responses; (3) RLHF (Reinforcement Learning from Human Feedback) — humans rate pairs of responses and the model learns which kind people prefer. Each stage costs more than the last per data point but uses way less data.
Read Anthropic's Constitutional AI paper summary on their website (no math, just plain English about how they trained Claude). It takes 15 minutes and you'll understand more about modern AI than 95% of people.
Two stages built every modern AI. Knowing the difference helps you understand why models behave so differently.
The big idea: Two stages make AI: read everything, then learn how to behave. Both stages matter.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-foundations-ai-how-models-are-trained-r9a10-teen
What is the first major stage in training a modern AI language model?
During the pretraining stage, about how many tokens did GPT-4 process?
Why do models sometimes give responses that seem overly agreeable or flattering?
What is fine-tuning in the AI training process?
In RLHF, what do human raters actually do?
What makes Constitutional AI different from standard RLHF?
Which training stage is primarily responsible when an AI produces harmful or biased outputs?
The lesson calls the three training stages 'the recipe for every chatbot on Earth.' Why?
What is 'training data' in the context of AI development?
If you wanted to teach an AI to follow specific rules (like 'don't give medical advice'), which training stage would you focus on?
What did Anthropic's Constitutional AI paper demonstrate?
Why do AI companies use so much internet text for pretraining?
The lesson mentions that RLHF uses about 1 million human comparisons. What is being compared?
What did the lesson say is unique about how Claude (Anthropic's AI) was trained?
Why might an AI be more honest if it were trained with less RLHF?