Tokens and Embeddings: How AI Reads Words

AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.

30 min · Reviewed 2026

From Letters to Numbers

When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens. Second, each token gets replaced by a long list of numbers called an embedding.

Step one: tokenization

A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts. One rule of thumb: about 4 characters of English text equals 1 token, roughly.

cat → 1 token
unbelievable → usually 2 or 3 tokens like un, believ, able
😀 → usually 1 to 3 tokens depending on the tokenizer
spaces and punctuation often become their own tokens

Step two: embeddings

Each token gets replaced with a vector, usually hundreds or thousands of numbers long. This vector is the token's location in a huge invisible map of meaning. Tokens with similar meanings end up close together on that map.

A tiny example in code

tokens = tokenizer.encode("I love pizza")
# tokens might be [40, 1842, 21876]

embeddings = model.embed(tokens)
# each token now a 4096-number vector
# embeddings.shape == (3, 4096)Turning words into vectors before the model thinks about them.

Why this matters for you

Token counts decide how much you can fit in a prompt
Embeddings power search, recommendations, and clustering
Most AI tools charge per token, not per word

The big idea: text becomes tokens, tokens become vectors, and meaning becomes math. That is the doorway from human language into the inside of an AI model.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-tokens-and-embeddings

What is the core idea behind "Tokens and Embeddings: How AI Reads Words"?
1. AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which term best describes a foundational idea in "Tokens and Embeddings: How AI Reads Words"?
1. tokenizer
2. token
3. embedding
4. vector
A learner studying Tokens and Embeddings: How AI Reads Words would need to understand which concept?
1. token
2. embedding
3. tokenizer
4. vector
Which of these is directly relevant to Tokens and Embeddings: How AI Reads Words?
1. token
2. tokenizer
3. vector
4. embedding
Which of the following is a key point about Tokens and Embeddings: How AI Reads Words?
1. cat → 1 token
2. unbelievable → usually 2 or 3 tokens like un, believ, able
3. 😀 → usually 1 to 3 tokens depending on the tokenizer
4. spaces and punctuation often become their own tokens
Which of these does NOT belong in a discussion of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. cat → 1 token
3. unbelievable → usually 2 or 3 tokens like un, believ, able
4. 😀 → usually 1 to 3 tokens depending on the tokenizer
Which statement is accurate regarding Tokens and Embeddings: How AI Reads Words?
1. Embeddings power search, recommendations, and clustering
2. Most AI tools charge per token, not per word
3. Token counts decide how much you can fit in a prompt
4. Draft tool-call validation rules with named owners.
What is the key insight about "Why chunk at all?" in the context of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. You decide if AI's idea is good enough to use.
4. Words you have never seen still work, because the tokenizer can split them into familiar parts.
What is the recommended tip about "Build your mental model" in the context of Tokens and Embeddings: How AI Reads Words?
1. AI isn't magic — it's pattern recognition at scale. The more you understand how it works, the more effectively you can u…
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
What is the key insight about "Token budgets are real" in the context of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Every model has a context window measured in tokens. Run out of space and older parts of the conversation get cut.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which statement accurately describes an aspect of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens.
4. You decide if AI's idea is good enough to use.
What does working with Tokens and Embeddings: How AI Reads Words typically involve?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. You decide if AI's idea is good enough to use.
4. A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts.
Which of the following is true about Tokens and Embeddings: How AI Reads Words?
1. Each token gets replaced with a vector, usually hundreds or thousands of numbers long.
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which best describes the scope of "Tokens and Embeddings: How AI Reads Words"?
1. It is unrelated to foundations workflows
2. It focuses on AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how te
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. Step one: tokenization
4. You decide if AI's idea is good enough to use.

← Back to interactive lesson

Tendril · Builders · AI Foundations

Tokens and Embeddings: How AI Reads Words

AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.

30 min · Reviewed 2026

From Letters to Numbers

Step one: tokenization

A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts. One rule of thumb: about 4 characters of English text equals 1 token, roughly.

cat → 1 token
unbelievable → usually 2 or 3 tokens like un, believ, able
😀 → usually 1 to 3 tokens depending on the tokenizer
spaces and punctuation often become their own tokens

Step two: embeddings

A tiny example in code

tokens = tokenizer.encode("I love pizza")
# tokens might be [40, 1842, 21876]

embeddings = model.embed(tokens)
# each token now a 4096-number vector
# embeddings.shape == (3, 4096)Turning words into vectors before the model thinks about them.

Why this matters for you

Token counts decide how much you can fit in a prompt
Embeddings power search, recommendations, and clustering
Most AI tools charge per token, not per word

The big idea: text becomes tokens, tokens become vectors, and meaning becomes math. That is the doorway from human language into the inside of an AI model.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-tokens-and-embeddings

What is the core idea behind "Tokens and Embeddings: How AI Reads Words"?
1. AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which term best describes a foundational idea in "Tokens and Embeddings: How AI Reads Words"?
1. tokenizer
2. token
3. embedding
4. vector
A learner studying Tokens and Embeddings: How AI Reads Words would need to understand which concept?
1. token
2. embedding
3. tokenizer
4. vector
Which of these is directly relevant to Tokens and Embeddings: How AI Reads Words?
1. token
2. tokenizer
3. vector
4. embedding
Which of the following is a key point about Tokens and Embeddings: How AI Reads Words?
1. cat → 1 token
2. unbelievable → usually 2 or 3 tokens like un, believ, able
3. 😀 → usually 1 to 3 tokens depending on the tokenizer
4. spaces and punctuation often become their own tokens
Which of these does NOT belong in a discussion of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. cat → 1 token
3. unbelievable → usually 2 or 3 tokens like un, believ, able
4. 😀 → usually 1 to 3 tokens depending on the tokenizer
Which statement is accurate regarding Tokens and Embeddings: How AI Reads Words?
1. Embeddings power search, recommendations, and clustering
2. Most AI tools charge per token, not per word
3. Token counts decide how much you can fit in a prompt
4. Draft tool-call validation rules with named owners.
What is the key insight about "Why chunk at all?" in the context of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. You decide if AI's idea is good enough to use.
4. Words you have never seen still work, because the tokenizer can split them into familiar parts.
What is the recommended tip about "Build your mental model" in the context of Tokens and Embeddings: How AI Reads Words?
1. AI isn't magic — it's pattern recognition at scale. The more you understand how it works, the more effectively you can u…
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
What is the key insight about "Token budgets are real" in the context of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Every model has a context window measured in tokens. Run out of space and older parts of the conversation get cut.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which statement accurately describes an aspect of Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens.
4. You decide if AI's idea is good enough to use.
What does working with Tokens and Embeddings: How AI Reads Words typically involve?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. You decide if AI's idea is good enough to use.
4. A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts.
Which of the following is true about Tokens and Embeddings: How AI Reads Words?
1. Each token gets replaced with a vector, usually hundreds or thousands of numbers long.
2. Draft tool-call validation rules with named owners.
3. Temperature controls how 'creative' an AI gets.
4. You decide if AI's idea is good enough to use.
Which best describes the scope of "Tokens and Embeddings: How AI Reads Words"?
1. It is unrelated to foundations workflows
2. It focuses on AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how te
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Tokens and Embeddings: How AI Reads Words?
1. Draft tool-call validation rules with named owners.
2. Temperature controls how 'creative' an AI gets.
3. Step one: tokenization
4. You decide if AI's idea is good enough to use.

← Back to interactive lesson