Lesson 301 of 2116
Copyright vs. Terms of Service: Two Different Fights
Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Two Totally Different Legal Worlds
- 2terms of service
- 3copyright
- 4CFAA
Concept cluster
Terms to connect while reading
Section 1
Two Totally Different Legal Worlds
You scrape a website that says no scraping in its ToS. Have you broken the law? The honest answer is: probably not, but it depends. Have you also violated copyright? That is a completely separate question with its own answer.
Terms of service: contract law
Terms of Service are contracts. If you violate them, the platform can sue you for breach of contract. But the damages are usually small unless the platform can show real financial harm. In HiQ Labs v. LinkedIn (2022), the US Ninth Circuit held that scraping public LinkedIn profiles did not violate the Computer Fraud and Abuse Act, even though it broke LinkedIn's ToS.
Copyright: statutory law
Copyright is a federal statute in the US (Title 17). Infringement can mean statutory damages up to $150,000 per work. When OpenAI or Meta train on copyrighted books or articles, they are either relying on fair use or accepting litigation risk. No amount of ToS consent can fix a copyright violation if you never had the right to train in the first place.
Compare the options
| Aspect | ToS violation | Copyright infringement |
|---|---|---|
| Law type | Contract | Federal statute |
| Who sues | Platform | Copyright holder |
| Damages | Usually modest | Up to $150k per work |
| Criminal? | No | Yes, in extreme cases |
| Geography | Where contract applies | Where work is protected |
The CFAA wrinkle
Fair use in training
The argument AI companies make is that training is transformative fair use. The model learns patterns, not verbatim text. Critics reply that models can regurgitate training data verbatim, undermining the transformative argument. Current lawsuits (NYT v. OpenAI, Bartz v. Anthropic, Authors Guild v. Meta) will clarify this over the next few years.
Key terms in this lesson
The big idea: ToS and copyright are separate legal risks. A dataset can be ToS-clean and copyright-dirty, or vice versa. Serious practitioners check both, not just one.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Copyright vs. Terms of Service: Two Different Fights”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Who Owns the Data in a Dataset?
Ownership of data is not one question but a tangle of rights: copyright, contract, privacy, and control. Untangling them is essential for responsible use.
Creators · 30 min
Debate Prep: Researching Both Sides Fast
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
Creators · 35 min
Running a Literature Review With AI
AI turns weeks of literature review into days — if you know how to use it. Here is a workflow that actually works.
