Loading lesson…
AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.
When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens. Second, each token gets replaced by a long list of numbers called an embedding.
A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts. One rule of thumb: about 4 characters of English text equals 1 token, roughly.
Each token gets replaced with a vector, usually hundreds or thousands of numbers long. This vector is the token's location in a huge invisible map of meaning. Tokens with similar meanings end up close together on that map.
tokens = tokenizer.encode("I love pizza")
# tokens might be [40, 1842, 21876]
embeddings = model.embed(tokens)
# each token now a 4096-number vector
# embeddings.shape == (3, 4096)Turning words into vectors before the model thinks about them.The big idea: text becomes tokens, tokens become vectors, and meaning becomes math. That is the doorway from human language into the inside of an AI model.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-tokens-and-embeddings
What is the core idea behind "Tokens and Embeddings: How AI Reads Words"?
Which term best describes a foundational idea in "Tokens and Embeddings: How AI Reads Words"?
A learner studying Tokens and Embeddings: How AI Reads Words would need to understand which concept?
Which of these is directly relevant to Tokens and Embeddings: How AI Reads Words?
Which of the following is a key point about Tokens and Embeddings: How AI Reads Words?
Which of these does NOT belong in a discussion of Tokens and Embeddings: How AI Reads Words?
Which statement is accurate regarding Tokens and Embeddings: How AI Reads Words?
What is the key insight about "Why chunk at all?" in the context of Tokens and Embeddings: How AI Reads Words?
What is the recommended tip about "Build your mental model" in the context of Tokens and Embeddings: How AI Reads Words?
What is the key insight about "Token budgets are real" in the context of Tokens and Embeddings: How AI Reads Words?
Which statement accurately describes an aspect of Tokens and Embeddings: How AI Reads Words?
What does working with Tokens and Embeddings: How AI Reads Words typically involve?
Which of the following is true about Tokens and Embeddings: How AI Reads Words?
Which best describes the scope of "Tokens and Embeddings: How AI Reads Words"?
Which section heading best belongs in a lesson about Tokens and Embeddings: How AI Reads Words?