Running Your Own Small Experiment

The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.

45 min · Reviewed 2026

Be a Scientist, Not Just a Reader

You do not need a GPU cluster to do AI research. The best small experiments are tiny, specific, and fast — a clear question answered in a Jupyter notebook over an afternoon. Real understanding comes from running them.

A 7-step experiment recipe

Pick one specific hypothesis (e.g., 'Claude beats GPT on Finnish translation')
Write down what you expect to see before you start
Build the smallest possible test set (30-50 items)
Pick two or three models to compare
Run the test, capture raw outputs, do not over-engineer
Grade with a rubric — LLM-as-judge or human
Write up the result in one page, including what surprised you

Example: does CoT actually help on easy math?

Wei et al. (2022) showed chain-of-thought helps on hard problems. Does it help on grade-school problems too? Run 30 single-digit addition problems, once with 'think step by step' and once without, on the same model. Compare. If you see no difference, you have replicated a real published finding (CoT doesn't help easy tasks). That is real science.

# Tiny experiment skeleton import anthropic client = anthropic.Anthropic() problems = [ ("2 + 3", "5"), ("7 + 8", "15"), # 28 more ] def run(prompt_prefix): correct = 0 for q, a in problems: resp = client.messages.create( model="claude-opus-4-7", max_tokens=256, messages=[{"role":"user","content":prompt_prefix+q}] ) if a in resp.content[0].text: correct += 1 return correct / len(problems) print("Plain:", run("")) print("CoT:", run("Think step by step. "))An afternoon's worth of real AI research, in 20 lines

What counts as a good experiment

One variable changes at a time
Sample size is at least ~30 for any quantitative claim
Results are recorded with raw outputs, not just summary numbers
You wrote your prediction BEFORE running — that keeps you honest
You can describe the limits in one paragraph

The most exciting phrase to hear in science is not 'Eureka!' but 'That's funny'
— Attributed to Isaac Asimov

The big idea: a one-afternoon experiment teaches you more about AI than a month of reading. Pick a question, run the test, write it up. Repeat.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-run-your-own-experiment

What is the main idea of "Running Your Own Small Experiment"?
1. The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Running Your Own Small Experiment"?
1. hypothesis
2. experiment
3. replication
4. notebook
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Pick one specific hypothesis (e.g., 'Claude beats GPT on Finnish translation')
4. Treat the AI output as automatically correct
What should a careful learner remember about "Negative results are results"?
1. Use AI to draft or organize ideas about experiment, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about experiment be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about experiment.
Which action would help you apply "Running Your Own Small Experiment" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Write down what you expect to see before you start

← Back to interactive lesson

Tendril · Creators · AI Foundations

Running Your Own Small Experiment

The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.

45 min · Reviewed 2026

Be a Scientist, Not Just a Reader

A 7-step experiment recipe

Pick one specific hypothesis (e.g., 'Claude beats GPT on Finnish translation')
Write down what you expect to see before you start
Build the smallest possible test set (30-50 items)
Pick two or three models to compare
Run the test, capture raw outputs, do not over-engineer
Grade with a rubric — LLM-as-judge or human
Write up the result in one page, including what surprised you

Example: does CoT actually help on easy math?

# Tiny experiment skeleton import anthropic client = anthropic.Anthropic() problems = [ ("2 + 3", "5"), ("7 + 8", "15"), # 28 more ] def run(prompt_prefix): correct = 0 for q, a in problems: resp = client.messages.create( model="claude-opus-4-7", max_tokens=256, messages=[{"role":"user","content":prompt_prefix+q}] ) if a in resp.content[0].text: correct += 1 return correct / len(problems) print("Plain:", run("")) print("CoT:", run("Think step by step. "))An afternoon's worth of real AI research, in 20 lines

What counts as a good experiment

One variable changes at a time
Sample size is at least ~30 for any quantitative claim
Results are recorded with raw outputs, not just summary numbers
You wrote your prediction BEFORE running — that keeps you honest
You can describe the limits in one paragraph

The most exciting phrase to hear in science is not 'Eureka!' but 'That's funny'
— Attributed to Isaac Asimov

The big idea: a one-afternoon experiment teaches you more about AI than a month of reading. Pick a question, run the test, write it up. Repeat.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-run-your-own-experiment

What is the main idea of "Running Your Own Small Experiment"?
1. The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Running Your Own Small Experiment"?
1. hypothesis
2. experiment
3. replication
4. notebook
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Pick one specific hypothesis (e.g., 'Claude beats GPT on Finnish translation')
4. Treat the AI output as automatically correct
What should a careful learner remember about "Negative results are results"?
1. Use AI to draft or organize ideas about experiment, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about experiment be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about experiment.
Which action would help you apply "Running Your Own Small Experiment" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Write down what you expect to see before you start

← Back to interactive lesson