neural-forge.io

Sign inStartOpen studio

Tendril

AI Foundations0%

Lesson 258 of 1596

Copyright vs. Terms of Service: Two Different Fights

Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.

Creators · AI Foundations · ~17 min read

Two Totally Different Legal Worlds

You scrape a website that says no scraping in its ToS. Have you broken the law? The honest answer is: probably not, but it depends. Have you also violated copyright? That is a completely separate question with its own answer.

Terms of service: contract law

Terms of Service are contracts. If you violate them, the platform can sue you for breach of contract. But the damages are usually small unless the platform can show real financial harm. In HiQ Labs v. LinkedIn (2022), the US Ninth Circuit held that scraping public LinkedIn profiles did not violate the Computer Fraud and Abuse Act, even though it broke LinkedIn's ToS.

Copyright: statutory law

Copyright is a federal statute in the US (Title 17). Infringement can mean statutory damages up to $150,000 per work. When OpenAI or Meta train on copyrighted books or articles, they are either relying on fair use or accepting litigation risk. No amount of ToS consent can fix a copyright violation if you never had the right to train in the first place.

Compare the options

Aspect	ToS violation	Copyright infringement
Law type	Contract	Federal statute
Who sues	Platform	Copyright holder
Damages	Usually modest	Up to $150k per work
Criminal?	No	Yes, in extreme cases
Geography	Where contract applies	Where work is protected

The CFAA wrinkle

Fair use in training

The argument AI companies make is that training is transformative fair use. The model learns patterns, not verbatim text. Critics reply that models can regurgitate training data verbatim, undermining the transformative argument. Current lawsuits (NYT v. OpenAI, Bartz v. Anthropic, Authors Guild v. Meta) will clarify this over the next few years.

Key terms in this lesson

The big idea: ToS and copyright are separate legal risks. A dataset can be ToS-clean and copyright-dirty, or vice versa. Serious practitioners check both, not just one.

End-of-lesson quiz

Check what stuck

6 questions · Score saves to your progress.

Tutor

Curious about “Copyright vs. Terms of Service: Two Different Fights”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going