Loading lesson…
AI does not read letters. It reads tokens, which live as vectors in a space of meaning. Learn how text becomes numbers you can do math on.
When you type a sentence to an AI, it does not see the letters. The text goes through two conversions. First, it gets chopped into tokens. Second, each token gets replaced by a long list of numbers called an embedding.
A tokenizer breaks text into chunks. Common chunks get their own token. Rare words are split into parts. One rule of thumb: about 4 characters of English text equals 1 token, roughly.
Each token gets replaced with a vector, usually hundreds or thousands of numbers long. This vector is the token's location in a huge invisible map of meaning. Tokens with similar meanings end up close together on that map.
tokens = tokenizer.encode("I love pizza") # tokens might be [40, 1842, 21876] embeddings = model.embed(tokens) # each token now a 4096-number vector # embeddings.shape == (3, 4096)Turning words into vectors before the model thinks about them.The big idea: text becomes tokens, tokens become vectors, and meaning becomes math. That is the doorway from human language into the inside of an AI model.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-tokens-and-embeddings
What is the main idea of "Tokens and Embeddings: How AI Reads Words"?
Which concept is most central to "Tokens and Embeddings: How AI Reads Words"?
Which use of AI fits this topic best?
What should a careful learner remember about "Why chunk at all?"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about tokens be treated?
Name one way to verify an AI answer about tokens.
Which action would help you apply "Tokens and Embeddings: How AI Reads Words" responsibly?