Lesson 205 of 2116
RAG From Scratch
Chunk, embed, store, retrieve, generate. Build retrieval-augmented generation in a single file.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Five Steps
- 2chunking
- 3embedding
- 4retrieval
Concept cluster
Terms to connect while reading
Section 1
The Five Steps
RAG is: chunk documents, embed chunks, store vectors, retrieve top-k for a query, generate an answer grounded in retrieved chunks. Everything else is variation.
Chunking with overlap, batched embeddings, cosine similarity. The full math fits in 20 lines.
from openai import OpenAI
import numpy as np
client = OpenAI()
def chunk(text: str, size: int = 400, overlap: int = 50) -> list[str]:
words = text.split()
chunks: list[str] = []
i = 0
while i < len(words):
chunks.append(" ".join(words[i:i+size]))
i += size - overlap
return chunks
def embed(texts: list[str]) -> np.ndarray:
r = client.embeddings.create(model="text-embedding-3-small", input=texts)
return np.array([d.embedding for d in r.data], dtype=np.float32)
def cosine(a: np.ndarray, b: np.ndarray) -> np.ndarray:
a_n = a / np.linalg.norm(a, axis=1, keepdims=True)
b_n = b / np.linalg.norm(b)
return a_n @ b_nEmbed once at startup, search in-memory, ground the prompt in retrieved chunks. Good enough for 10k-chunk corpora.
DOC = open("handbook.txt", encoding="utf-8").read()
CHUNKS = chunk(DOC)
MATRIX = embed(CHUNKS)
def answer(question: str, k: int = 4) -> str:
q_vec = embed([question])[0]
scores = cosine(MATRIX, q_vec)
top = np.argsort(-scores)[:k]
context = "\n\n---\n\n".join(CHUNKS[i] for i in top)
r = client.responses.create(
model="gpt-5",
input=[
{"role": "system", "content": "Answer only from the provided context. If unsure, say you don't know."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
],
)
return r.output_text
print(answer("What is the PTO policy?"))Understanding "RAG From Scratch" in practice: AI-assisted coding shifts work from syntax recall to design thinking — models handle boilerplate so you focus on architecture. Chunk, embed, store, retrieve, generate. Build retrieval-augmented generation in a single file — and knowing how to apply this gives you a concrete advantage.
- Apply chunking in your ai-coding workflow to get better results
- Apply embedding in your ai-coding workflow to get better results
- Apply retrieval in your ai-coding workflow to get better results
- Apply grounding in your ai-coding workflow to get better results
- 1Use AI to generate unit tests for an existing function
- 2Ask AI to refactor a messy function and explain the changes
- 3Have AI suggest a code review for a recent pull request
The big idea: RAG is a five-step pipeline, not magic. Own every step once, then upgrade with a real vector DB when you outgrow numpy.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “RAG From Scratch”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
Vector DB Basics With pgvector
Store embeddings, search by similarity. The foundation of every RAG system. Postgres plus pgvector gets you there.
Creators · 40 min
Agents vs. Autocomplete — the Mental Model Shift
Autocomplete is a suggestion. An agent is an actor. The mental model you bring to each is different, and conflating them is the number-one reason teams trip over AI coding.
Creators · 50 min
Long-Context Code Understanding — The 1M-Token Era
Frontier models now read a million tokens of your codebase in one shot. That changes how we architect prompts, retrieval, and the cost curve of agentic work.
