Lesson 3 of 1570
Tokens and Embeddings: How AI Reads Words
AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1From Letters to Numbers
- 2tokens
- 3embeddings
- 4vectors
Concept cluster
Terms to connect while reading
Section 1
From Letters to Numbers
When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens. Second, each token gets replaced by a long list of numbers called an embedding.
Step one: tokenization
A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts. One rule of thumb: about 4 characters of English text equals 1 token, roughly.
- cat → 1 token
- unbelievable → usually 2 or 3 tokens like un, believ, able
- 😀 → usually 1 to 3 tokens depending on the tokenizer
- spaces and punctuation often become their own tokens
Step two: embeddings
Each token gets replaced with a vector, usually hundreds or thousands of numbers long. This vector is the token's location in a huge invisible map of meaning. Tokens with similar meanings end up close together on that map.
A tiny example in code
Turning words into vectors before the model thinks about them.
tokens = tokenizer.encode("I love pizza")
# tokens might be [40, 1842, 21876]
embeddings = model.embed(tokens)
# each token now a 4096-number vector
# embeddings.shape == (3, 4096)Why this matters for you
- Token counts decide how much you can fit in a prompt
- Embeddings power search, recommendations, and clustering
- Most AI tools charge per token, not per word
Key terms in this lesson
The big idea: text becomes tokens, tokens become vectors, and meaning becomes math. That is the doorway from human language into the inside of an AI model.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Tokens and Embeddings: How AI Reads Words”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 25 min
Word2vec: Meaning Becomes Geometry
A 2013 paper from Google showed that words could live as points in space, with analogies as arithmetic.
Builders · 22 min
What a Spreadsheet Actually Is
Excel and Google Sheets hide a lot of complexity behind a pretty grid. Once you see what is really happening, you will never look at a spreadsheet the same way.
Builders · 28 min
Quality Filtering: Separating Signal From Noise
The raw web is 99 percent garbage. Filtering it down to the 1 percent worth training on is one of the highest-leverage steps in modern AI.
