Lesson 276 of 2116
Running Your Own Small Experiment
The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Be a Scientist, Not Just a Reader
- 2experiment
- 3hypothesis
- 4replication
Concept cluster
Terms to connect while reading
Section 1
Be a Scientist, Not Just a Reader
You do not need a GPU cluster to do AI research. The best small experiments are tiny, specific, and fast — a clear question answered in a Jupyter notebook over an afternoon. Real understanding comes from running them.
A 7-step experiment recipe
- 1Pick one specific hypothesis (e.g., 'Claude beats GPT on Finnish translation')
- 2Write down what you expect to see before you start
- 3Build the smallest possible test set (30-50 items)
- 4Pick two or three models to compare
- 5Run the test, capture raw outputs, do not over-engineer
- 6Grade with a rubric — LLM-as-judge or human
- 7Write up the result in one page, including what surprised you
Example: does CoT actually help on easy math?
Wei et al. (2022) showed chain-of-thought helps on hard problems. Does it help on grade-school problems too? Run 30 single-digit addition problems, once with 'think step by step' and once without, on the same model. Compare. If you see no difference, you have replicated a real published finding (CoT doesn't help easy tasks). That is real science.
An afternoon's worth of real AI research, in 20 lines
# Tiny experiment skeleton
import anthropic
client = anthropic.Anthropic()
problems = [
("2 + 3", "5"),
("7 + 8", "15"),
# ... 28 more
]
def run(prompt_prefix):
correct = 0
for q, a in problems:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=256,
messages=[{"role":"user","content":prompt_prefix+q}]
)
if a in resp.content[0].text:
correct += 1
return correct / len(problems)
print("Plain:", run(""))
print("CoT:", run("Think step by step. "))What counts as a good experiment
- One variable changes at a time
- Sample size is at least ~30 for any quantitative claim
- Results are recorded with raw outputs, not just summary numbers
- You wrote your prediction BEFORE running — that keeps you honest
- You can describe the limits in one paragraph
“The most exciting phrase to hear in science is not 'Eureka!' but 'That's funny...'”
Key terms in this lesson
The big idea: a one-afternoon experiment teaches you more about AI than a month of reading. Pick a question, run the test, write it up. Repeat.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Running Your Own Small Experiment”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 32 min
AP Biology: Using AI to Survive the Vocab Tsunami
AP Bio has roughly a thousand terms and four big concepts. NotebookLM and Claude Projects can turn your textbook into a custom tutor that actually knows what you are studying.
Creators · 30 min
Debate Prep: Researching Both Sides Fast
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
