Tendril

Lesson 19 of 1570

Where Bias in AI Actually Comes From

AI bias is not magic and not moral failure. It is math operating on imperfect data. Here is exactly where the bias enters the system.

BuildersEthics & Society~17 min readIntermediateResearcherBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

28 min22 blocks3 concepts

Learning path

The main moves in order

1Bias Is Not a Bug — It Is Baked In
2training data bias
3representation gap
4labeler bias

Concept cluster

Terms to connect while reading

training data biasrepresentation gaplabeler bias

Sections7

Lists2

Notes4

Compare1

Quotes1

Section 1

Bias Is Not a Bug — It Is Baked In

When people say an AI is biased, they sometimes imagine a programmer typing biased rules. That is almost never what happened. AI bias is what you get when statistical models learn from data that reflects an unequal world.

There are at least four distinct places bias enters a modern AI system. Knowing which one you are looking at changes how you fix it.

Source 1: who wrote the training data

Large language models are trained mostly on English text from the public web. A huge share of that text comes from North America and Europe, written in the last 30 years, by people who had internet access. That demographic is not representative of the world, and the model quietly reflects whatever those writers thought was normal.

Check-in 1. Got it so far?

Source 2: who is missing

Languages spoken by fewer than 10 million people: often poorly represented
Dialects (African American English, Indian English, rural slang): under-represented
Communities who chose privacy over posting: invisible to the model
Pre-internet history: included via books but still patchy

Source 3: who labeled the data

After pretraining, companies pay humans to rank AI outputs. Those humans have opinions. They are often in one or two countries, speak one language, and share a culture. What they mark as helpful or harmful becomes the model's personality. Their blind spots become the model's blind spots.

Check-in 2. Got it so far?

Source 4: who the model is deployed on

Even a decent model can behave badly in the wild. A resume screener trained on past hiring decisions will replicate past hiring bias. A face recognizer trained mostly on lighter-skinned faces will fail on darker-skinned ones. This is not new — it is documented.

Compare: four sources, four fixes

Compare the options

Bias source	Fix approach
Skewed training data	Add data from underrepresented groups
Missing groups	Targeted data collection + evaluation
Labeler blind spots	Diverse labeler pools, multiple reviewers
Deployment mismatch	Audit the model on the population actually using it

Check-in 3. Got it so far?

Why debiasing is genuinely hard

Data that reflects the real world will reflect real-world inequality
Fixing one metric often makes another worse
Different groups have different, sometimes conflicting definitions of fair
Auditing requires demographic data you may not be allowed to collect

“The problem is not that AI is biased. The problem is that the world is, and AI learned from it.”
Timnit Gebru

Key terms in this lesson

Check-in 4. Got it so far?

The big idea: AI bias is a downstream symptom of upstream data choices. Fixing it is an engineering problem, a research problem, and a political problem all at once. Any of the four sources is a useful starting point.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Where Bias in AI Actually Comes From”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Where Bias in AI Actually Comes From

Bias Is Not a Bug — It Is Baked In

Source 1: who wrote the training data

Source 2: who is missing

Source 3: who labeled the data

Source 4: who the model is deployed on

Compare: four sources, four fixes

Why debiasing is genuinely hard

Curious about “Where Bias in AI Actually Comes From”?

Keep going

Where Bias in AI Actually Comes From

Bias Is Not a Bug — It Is Baked In

Source 1: who wrote the training data

Source 2: who is missing

Source 3: who labeled the data

Source 4: who the model is deployed on

Compare: four sources, four fixes

Why debiasing is genuinely hard

Curious about “Where Bias in AI Actually Comes From”?

Keep going