Copyright vs. Terms of Service: Two Different Fights

Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.

28 min · Reviewed 2026

Two Totally Different Legal Worlds

You scrape a website that says no scraping in its ToS. Have you broken the law? The honest answer is: probably not, but it depends. Have you also violated copyright? That is a completely separate question with its own answer.

Terms of service: contract law

Terms of Service are contracts. If you violate them, the platform can sue you for breach of contract. But the damages are usually small unless the platform can show real financial harm. In HiQ Labs v. LinkedIn (2022), the US Ninth Circuit held that scraping public LinkedIn profiles did not violate the Computer Fraud and Abuse Act, even though it broke LinkedIn's ToS.

Copyright: statutory law

Copyright is a federal statute in the US (Title 17). Infringement can mean statutory damages up to $150,000 per work. When OpenAI or Meta train on copyrighted books or articles, they are either relying on fair use or accepting litigation risk. No amount of ToS consent can fix a copyright violation if you never had the right to train in the first place.

Aspect	ToS violation	Copyright infringement
Law type	Contract	Federal statute
Who sues	Platform	Copyright holder
Damages	Usually modest	Up to $150k per work
Criminal?	No	Yes, in extreme cases
Geography	Where contract applies	Where work is protected

The CFAA wrinkle

Fair use in training

The argument AI companies make is that training is transformative fair use. The model learns patterns, not verbatim text. Critics reply that models can regurgitate training data verbatim, undermining the transformative argument. Current lawsuits (NYT v. OpenAI, Bartz v. Anthropic, Authors Guild v. Meta) will clarify this over the next few years.

The big idea: ToS and copyright are separate legal risks. A dataset can be ToS-clean and copyright-dirty, or vice versa. Serious practitioners check both, not just one.

End-of-lesson check

6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-copyright-vs-tos

What is the main idea of "Copyright vs. Terms of Service: Two Different Fights"?
1. Violating a website's Terms of Service and violating copyright are different legal problems.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Copyright vs. Terms of Service: Two Different Fights"?
1. copyright
2. terms of service
3. CFAA
4. fair use
What should a careful learner remember about "Breaking through a login wall"?
1. Use AI to draft or organize ideas about terms of service, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about terms of service be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about terms of service.

← Back to interactive lesson

Tendril · Creators · AI Foundations