Tendril — AI Lessons for Real Life

Tendril

From Text to Math

An embedding turns text into a list of numbers — a vector. Similar meanings land near each other in that space. A vector database does one thing: find the nearest neighbors to a query vector, fast.

CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE docs ( id BIGSERIAL PRIMARY KEY, body TEXT NOT NULL, embedding vector(1536) NOT NULL -- 1536 for text-embedding-3-small ); -- HNSW index for fast approximate nearest-neighbor search CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);The vector type plus an HNSW index is all pgvector needs. Works inside regular Postgres.

import OpenAI from "openai"; import { Pool } from "pg"; const openai = new OpenAI(); const pool = new Pool({ connectionString: process.env.DATABASE_URL }); async function embed(text: string): Promise<number[]> { const r = await openai.embeddings.create({ model: "text-embedding-3-small", input: text, }); return r.data[0].embedding; } export async function insert(body: string) { const vec = await embed(body); await pool.query( "INSERT INTO docs (body, embedding) VALUES ($1, $2)", [body, `[${vec.join(",")}]`] ); } export async function search(query: string, k = 5) { const vec = await embed(query); const { rows } = await pool.query( "SELECT id, body, 1 - (embedding <=> $1) AS score FROM docs ORDER BY embedding <=> $1 LIMIT $2", [`[${vec.join(",")}]`, k] ); return rows; }The <=> operator is cosine distance. Lower = more similar. 1 - distance = a nice 0..1 score.

The big idea: embed once, search many. Postgres plus pgvector handles production-grade semantic search without a separate vector service.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-vector-db-creators

What transformation does an embedding perform on text data?

It converts text into a list of numbers representing semantic meaning
It generates images from textual descriptions
It translates text from one language to another
It compresses text files to save storage space

In vector space, what does it mean for two texts to be 'near' each other?

They appear close together in a document
They were written around the same time
They have similar underlying meanings
They share many of the same words

What is the primary function of a vector database?

To execute SQL queries on structured tables
To cache results from web API calls
To store complete documents in their original format
To find the nearest neighbors to a query vector quickly

Why must all embeddings in a table share the same dimension?

To reduce storage costs
To make the table look uniform and organized
Because mathematical operations require vectors of equal length
To enable automatic indexing

What happens if you store embeddings from different models in the same table without transformation?

Queries run faster due to model diversity
The math breaks because the vector spaces are incompatible
The database automatically converts them to a common format
Storage usage is reduced automatically

What type of nearest neighbors does HNSW return?

Exact nearest neighbors guaranteed
Random neighbors for testing
Approximate nearest neighbors
Neighbors sorted by publication date

For which use case is HNSW's approximate search NOT ideal?

Semantic document search
Product recommendation systems
Exact scientific calculations requiring perfect precision
Chatbot retrieval-augmented generation

What does increasing the ef_search parameter in HNSW typically affect?

Search speed and recall tradeoff
Storage consumption
Index build time only
Vector dimension limits

In vector databases, cosine distance measures what?

The physical storage location of vectors
The angle between two vectors, indicating semantic similarity
The straight-line distance between vector endpoints
The time it takes to compute a search

What is the efficiency advantage of 'embed once, search many'?

The database automatically creates new embeddings each time
It eliminates the need for any computation
It stores embeddings in multiple formats simultaneously
Embeddings can be reused for multiple queries without recomputation

What capability does pgvector add to standard PostgreSQL?

Graph visualization tools
Blockchain integration
Natural language processing pipelines
Vector storage and similarity search functions

What is a key benefit of using pgvector over a separate vector database service?

It eliminates the need for data synchronization between systems
It provides unlimited storage automatically
It requires no database server at all
It runs AI models directly in the database

What is pgvector's primary use case in AI applications?

Training machine learning models from scratch
Generating images from text prompts
Running SQL transactions on blockchain data
Enabling semantic search without a separate vector service

Why might approximate results be acceptable for RAG applications?

Approximate results use less storage
RAG always requires exact matches anyway
RAG cannot process exact results
Users won't notice small ranking differences in retrieved context

What would happen to search results if you queried with a vector of the wrong dimension?

The search would return random results
The database would automatically resize the query vector
The database would use default values for missing dimensions
An error would occur because dimension mismatch breaks distance calculations

From Text to Math

An embedding turns text into a list of numbers — a vector. Similar meanings land near each other in that space. A vector database does one thing: find the nearest neighbors to a query vector, fast.

The big idea: embed once, search many. Postgres plus pgvector handles production-grade semantic search without a separate vector service.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-progx-vector-db-creators

What transformation does an embedding perform on text data?

It converts text into a list of numbers representing semantic meaning
It generates images from textual descriptions
It translates text from one language to another
It compresses text files to save storage space

In vector space, what does it mean for two texts to be 'near' each other?

They appear close together in a document
They were written around the same time
They have similar underlying meanings
They share many of the same words

What is the primary function of a vector database?

To execute SQL queries on structured tables
To cache results from web API calls
To store complete documents in their original format
To find the nearest neighbors to a query vector quickly

Why must all embeddings in a table share the same dimension?

To reduce storage costs
To make the table look uniform and organized
Because mathematical operations require vectors of equal length
To enable automatic indexing

What happens if you store embeddings from different models in the same table without transformation?

Queries run faster due to model diversity
The math breaks because the vector spaces are incompatible
The database automatically converts them to a common format
Storage usage is reduced automatically

What type of nearest neighbors does HNSW return?

Exact nearest neighbors guaranteed
Random neighbors for testing
Approximate nearest neighbors
Neighbors sorted by publication date

For which use case is HNSW's approximate search NOT ideal?

Semantic document search
Product recommendation systems
Exact scientific calculations requiring perfect precision
Chatbot retrieval-augmented generation

What does increasing the ef_search parameter in HNSW typically affect?

Search speed and recall tradeoff
Storage consumption
Index build time only
Vector dimension limits

In vector databases, cosine distance measures what?

The physical storage location of vectors
The angle between two vectors, indicating semantic similarity
The straight-line distance between vector endpoints
The time it takes to compute a search

What is the efficiency advantage of 'embed once, search many'?

The database automatically creates new embeddings each time
It eliminates the need for any computation
It stores embeddings in multiple formats simultaneously
Embeddings can be reused for multiple queries without recomputation

What capability does pgvector add to standard PostgreSQL?

Graph visualization tools
Blockchain integration
Natural language processing pipelines
Vector storage and similarity search functions

What is a key benefit of using pgvector over a separate vector database service?

It eliminates the need for data synchronization between systems
It provides unlimited storage automatically
It requires no database server at all
It runs AI models directly in the database

What is pgvector's primary use case in AI applications?

Training machine learning models from scratch
Generating images from text prompts
Running SQL transactions on blockchain data
Enabling semantic search without a separate vector service

Why might approximate results be acceptable for RAG applications?

Approximate results use less storage
RAG always requires exact matches anyway
RAG cannot process exact results
Users won't notice small ranking differences in retrieved context

What would happen to search results if you queried with a vector of the wrong dimension?

The search would return random results
The database would automatically resize the query vector
The database would use default values for missing dimensions
An error would occur because dimension mismatch breaks distance calculations

Vector DB Basics With pgvector

From Text to Math

End-of-lesson check

Vector DB Basics With pgvector

From Text to Math

End-of-lesson check