Loading lesson…
Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.
You scrape a website that says no scraping in its ToS. Have you broken the law? The honest answer is: probably not, but it depends. Have you also violated copyright? That is a completely separate question with its own answer.
Terms of Service are contracts. If you violate them, the platform can sue you for breach of contract. But the damages are usually small unless the platform can show real financial harm. In HiQ Labs v. LinkedIn (2022), the US Ninth Circuit held that scraping public LinkedIn profiles did not violate the Computer Fraud and Abuse Act, even though it broke LinkedIn's ToS.
Copyright is a federal statute in the US (Title 17). Infringement can mean statutory damages up to $150,000 per work. When OpenAI or Meta train on copyrighted books or articles, they are either relying on fair use or accepting litigation risk. No amount of ToS consent can fix a copyright violation if you never had the right to train in the first place.
| Aspect | ToS violation | Copyright infringement |
|---|---|---|
| Law type | Contract | Federal statute |
| Who sues | Platform | Copyright holder |
| Damages | Usually modest | Up to $150k per work |
| Criminal? | No | Yes, in extreme cases |
| Geography | Where contract applies | Where work is protected |
The argument AI companies make is that training is transformative fair use. The model learns patterns, not verbatim text. Critics reply that models can regurgitate training data verbatim, undermining the transformative argument. Current lawsuits (NYT v. OpenAI, Bartz v. Anthropic, Authors Guild v. Meta) will clarify this over the next few years.
The big idea: ToS and copyright are separate legal risks. A dataset can be ToS-clean and copyright-dirty, or vice versa. Serious practitioners check both, not just one.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-copyright-vs-tos
What is the main idea of "Copyright vs. Terms of Service: Two Different Fights"?
Which concept is most central to "Copyright vs. Terms of Service: Two Different Fights"?
What should a careful learner remember about "Breaking through a login wall"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about terms of service be treated?
Name one way to verify an AI answer about terms of service.