Loading lesson…
Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.
You scrape a website that says no scraping in its ToS. Have you broken the law? The honest answer is: probably not, but it depends. Have you also violated copyright? That is a completely separate question with its own answer.
Terms of Service are contracts. If you violate them, the platform can sue you for breach of contract. But the damages are usually small unless the platform can show real financial harm. In HiQ Labs v. LinkedIn (2022), the US Ninth Circuit held that scraping public LinkedIn profiles did not violate the Computer Fraud and Abuse Act, even though it broke LinkedIn's ToS.
Copyright is a federal statute in the US (Title 17). Infringement can mean statutory damages up to $150,000 per work. When OpenAI or Meta train on copyrighted books or articles, they are either relying on fair use or accepting litigation risk. No amount of ToS consent can fix a copyright violation if you never had the right to train in the first place.
| Aspect | ToS violation | Copyright infringement |
|---|---|---|
| Law type | Contract | Federal statute |
| Who sues | Platform | Copyright holder |
| Damages | Usually modest | Up to $150k per work |
| Criminal? | No | Yes, in extreme cases |
| Geography | Where contract applies | Where work is protected |
The argument AI companies make is that training is transformative fair use. The model learns patterns, not verbatim text. Critics reply that models can regurgitate training data verbatim, undermining the transformative argument. Current lawsuits (NYT v. OpenAI, Bartz v. Anthropic, Authors Guild v. Meta) will clarify this over the next few years.
The big idea: ToS and copyright are separate legal risks. A dataset can be ToS-clean and copyright-dirty, or vice versa. Serious practitioners check both, not just one.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-copyright-vs-tos
What is the core idea behind "Copyright vs. Terms of Service: Two Different Fights"?
Which term best describes a foundational idea in "Copyright vs. Terms of Service: Two Different Fights"?
A learner studying Copyright vs. Terms of Service: Two Different Fights would need to understand which concept?
Which of these is directly relevant to Copyright vs. Terms of Service: Two Different Fights?
What is the key insight about "Breaking through a login wall" in the context of Copyright vs. Terms of Service: Two Different Fights?
What is the key insight about "A safer middle ground" in the context of Copyright vs. Terms of Service: Two Different Fights?
What is the recommended tip about "Ground your practice in fundamentals" in the context of Copyright vs. Terms of Service: Two Different Fights?
Which statement accurately describes an aspect of Copyright vs. Terms of Service: Two Different Fights?
What does working with Copyright vs. Terms of Service: Two Different Fights typically involve?
Which of the following is true about Copyright vs. Terms of Service: Two Different Fights?
Which best describes the scope of "Copyright vs. Terms of Service: Two Different Fights"?
Which section heading best belongs in a lesson about Copyright vs. Terms of Service: Two Different Fights?
Which section heading best belongs in a lesson about Copyright vs. Terms of Service: Two Different Fights?
Which section heading best belongs in a lesson about Copyright vs. Terms of Service: Two Different Fights?
Which section heading best belongs in a lesson about Copyright vs. Terms of Service: Two Different Fights?