Loading lesson…
Chunk, embed, store, retrieve, generate. Build retrieval-augmented generation in a single file.
RAG is: chunk documents, embed chunks, store vectors, retrieve top-k for a query, generate an answer grounded in retrieved chunks. Everything else is variation.
from openai import OpenAI import numpy as np client = OpenAI() def chunk(text: str, size: int = 400, overlap: int = 50) -> list[str]: words = text.split() chunks: list[str] = [] i = 0 while i < len(words): chunks.append(" ".join(words[i:i+size])) i += size - overlap return chunks def embed(texts: list[str]) -> np.ndarray: r = client.embeddings.create(model="text-embedding-3-small", input=texts) return np.array([d.embedding for d in r.data], dtype=np.float32) def cosine(a: np.ndarray, b: np.ndarray) -> np.ndarray: a_n = a / np.linalg.norm(a, axis=1, keepdims=True) b_n = b / np.linalg.norm(b) return a_n @ b_nChunking with overlap, batched embeddings, cosine similarity. The full math fits in 20 lines.DOC = open("handbook.txt", encoding="utf-8").read() CHUNKS = chunk(DOC) MATRIX = embed(CHUNKS) def answer(question: str, k: int = 4) -> str: q_vec = embed([question])[0] scores = cosine(MATRIX, q_vec) top = np.argsort(-scores)[:k] context = "\n\n---\n\n".join(CHUNKS[i] for i in top) r = client.responses.create( model="gpt-5", input=[ {"role": "system", "content": "Answer only from the provided context. If unsure, say you don't know."}, {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}, ], ) return r.output_text print(answer("What is the PTO policy?"))Embed once at startup, search in-memory, ground the prompt in retrieved chunks. Good enough for 10k-chunk corpora.Understanding "RAG From Scratch" in practice: AI-assisted coding shifts work from syntax recall to design thinking — models handle boilerplate so you focus on architecture. Chunk, embed, store, retrieve, generate. Build retrieval-augmented generation in a single file — and knowing how to apply this gives you a concrete advantage.
The big idea: RAG is a five-step pipeline, not magic. Own every step once, then upgrade with a real vector DB when you outgrow numpy.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-rag-from-scratch-creators
What is the main idea of "RAG From Scratch"?
Which concept is most central to "RAG From Scratch"?
Which use of AI fits this topic best?
What should a careful learner remember about "Chunk smarter, not harder"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about chunking be treated?
Name one way to verify an AI answer about chunking.
Which action would help you apply "RAG From Scratch" responsibly?