Professor, Princeton; co-author AI Snake Oil
Representative of Arvind's threads on stress-testing AI vendor claims — asking for the specific benchmark, the baseline, the sample size, and whether the evaluation was done on private data.
“Every capability claim is three questions away from collapse. Ask the three questions.”
How to replicate
- 1.For any capability claim in a pitch or paper, write out the exact metric being reported.
- 2.Ask: what was the baseline? On what data? Sample size?
Prompt template
I have this AI product claim: <paste>. Draft a 5-question email to the vendor that isolates: (1) the exact metric, (2) the baseline comparison, (3) the evaluation dataset and whether it's public, (4) sample size and variance, (5) reproduction conditions. Keep the tone polite but non-negotiable. Do not accept marketing language in the reply.
Pitfall
Accepting 'our model outperforms GPT-4' without asking 'on what.' Models outperform each other on narrow slices all the time; the slice is the whole claim.
What you'll learn
- •Which questions collapse most capability claims
- •Why benchmark contamination is the default assumption until disproven
- •How to read vendor PDFs like a reviewer
- •When to trust a claim despite incomplete evidence
