Tendril — AI Lessons for Real Life

Tendril

The premise

Vector databases get treated as magic AI infrastructure. They are nearest-neighbor indexes over embeddings. Recall quality depends mostly on the embedding model and chunking strategy, not the DB you pick.

What AI does well here

Return semantically similar chunks for an embedded query

Scale to millions of vectors with the right index

Combine with metadata filters when configured

What AI cannot do

Improve recall when the embeddings are bad

Tell you why a relevant document didn't come back

Replace good chunking and metadata design

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-tools-vector-database-fundamentals-r7a1-creators

What is the fundamental function of a vector database?

It stores original documents and their metadata
It performs nearest-neighbor search over embedded data
It generates embeddings from raw text
It trains machine learning models on your data

Which factor has the greatest impact on recall quality in a vector search system?

The hardware infrastructure (GPU vs CPU)
The specific vector database product chosen
The embedding model and chunking strategy
The amount of metadata stored

Before adopting a dedicated vector database, what does the lesson recommend trying first?

Amazon DynamoDB with vector support
MongoDB with Atlas search
PostgreSQL with the pgvector extension
Elasticsearch with vector plugins

What does 'recall@k' measure?

The percentage of relevant items found within the top k results
The number of vectors the system can store
The time it takes to return k results
The similarity score threshold used for filtering

Why can't a vector database improve recall when the underlying embeddings are low quality?

Vector databases require perfect embeddings to function
The search algorithm automatically corrects errors in embeddings
Vector databases have strict size limits on embeddings
The database can only work with what the embeddings represent — garbage in, garbage out

What problem occurs when key sentences are split across multiple chunks using fixed token count chunking?

The chunks become too large to process efficiently
The embedding model ignores short chunks
The database automatically merges small chunks
Those sentences become unfindable because context is lost

What chunking approach does the lesson recommend for better recall?

Chunk by sentence boundaries only
Fixed token count (256 tokens each)
Chunk by semantic units like paragraphs or sections
Fixed character count (500 characters each)

What is semantic similarity search?

Finding documents created in the same time period
Finding documents that share the same authors
Finding results based on meaning rather than exact wording
Finding exact keyword matches in text

A relevant document fails to appear in search results. What is a likely cause the vector database cannot itself explain?

The document was added after the index was built
The query used the wrong HTTP method
The embedding model failed to capture the document's relevance to that query
The database ran out of storage space

What capability do vector databases offer when combined with metadata filters?

They can narrow results by both semantic similarity and structured criteria
They can guarantee perfect recall
They can convert structured filters into embeddings
They can automatically generate metadata

What is the primary purpose of converting text into embeddings?

To format text for display on websites
To remove personally identifiable information
To compress text for storage efficiency
To enable mathematical similarity comparisons

How do modern vector databases handle scaling to millions of vectors?

They require manual partitioning by the user
They use specialized index structures for approximate nearest-neighbor search
They switch to exact matching at scale
They automatically delete older vectors

What is the benefit of overlapping chunks when chunking documents for vector search?

It reduces storage requirements
It eliminates the need for embedding models
It automatically generates better embeddings
It ensures key ideas aren't split across non-matching chunks

Why is the embedding model considered more important than the database choice?

The database only indexes what the embedding model creates — poor embeddings can't be rescued by database features
Embedding models require less maintenance
Databases have no impact on results
Embedding models are more expensive to run

What does 'nearest-neighbor search' mean in the context of vector databases?

Finding the closest document by word count
Finding documents created by the same author
Finding the most recently added documents
Finding vectors that are mathematically closest in multi-dimensional space

The premise

What AI does well here

Return semantically similar chunks for an embedded query

Scale to millions of vectors with the right index

Combine with metadata filters when configured

What AI cannot do

Improve recall when the embeddings are bad

Tell you why a relevant document didn't come back

Replace good chunking and metadata design

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-tools-vector-database-fundamentals-r7a1-creators

What is the fundamental function of a vector database?

It stores original documents and their metadata
It performs nearest-neighbor search over embedded data
It generates embeddings from raw text
It trains machine learning models on your data

Which factor has the greatest impact on recall quality in a vector search system?

The hardware infrastructure (GPU vs CPU)
The specific vector database product chosen
The embedding model and chunking strategy
The amount of metadata stored

Before adopting a dedicated vector database, what does the lesson recommend trying first?

Amazon DynamoDB with vector support
MongoDB with Atlas search
PostgreSQL with the pgvector extension
Elasticsearch with vector plugins

What does 'recall@k' measure?

The percentage of relevant items found within the top k results
The number of vectors the system can store
The time it takes to return k results
The similarity score threshold used for filtering

Why can't a vector database improve recall when the underlying embeddings are low quality?

Vector databases require perfect embeddings to function
The search algorithm automatically corrects errors in embeddings
Vector databases have strict size limits on embeddings
The database can only work with what the embeddings represent — garbage in, garbage out

What problem occurs when key sentences are split across multiple chunks using fixed token count chunking?

The chunks become too large to process efficiently
The embedding model ignores short chunks
The database automatically merges small chunks
Those sentences become unfindable because context is lost

What chunking approach does the lesson recommend for better recall?

Chunk by sentence boundaries only
Fixed token count (256 tokens each)
Chunk by semantic units like paragraphs or sections
Fixed character count (500 characters each)

What is semantic similarity search?

Finding documents created in the same time period
Finding documents that share the same authors
Finding results based on meaning rather than exact wording
Finding exact keyword matches in text

A relevant document fails to appear in search results. What is a likely cause the vector database cannot itself explain?

The document was added after the index was built
The query used the wrong HTTP method
The embedding model failed to capture the document's relevance to that query
The database ran out of storage space

What capability do vector databases offer when combined with metadata filters?

They can narrow results by both semantic similarity and structured criteria
They can guarantee perfect recall
They can convert structured filters into embeddings
They can automatically generate metadata

What is the primary purpose of converting text into embeddings?

To format text for display on websites
To remove personally identifiable information
To compress text for storage efficiency
To enable mathematical similarity comparisons

How do modern vector databases handle scaling to millions of vectors?

They require manual partitioning by the user
They use specialized index structures for approximate nearest-neighbor search
They switch to exact matching at scale
They automatically delete older vectors

What is the benefit of overlapping chunks when chunking documents for vector search?

It reduces storage requirements
It eliminates the need for embedding models
It automatically generates better embeddings
It ensures key ideas aren't split across non-matching chunks

Why is the embedding model considered more important than the database choice?

The database only indexes what the embedding model creates — poor embeddings can't be rescued by database features
Embedding models require less maintenance
Databases have no impact on results
Embedding models are more expensive to run

What does 'nearest-neighbor search' mean in the context of vector databases?

Finding the closest document by word count
Finding documents created by the same author
Finding the most recently added documents
Finding vectors that are mathematically closest in multi-dimensional space

AI tools: vector databases without the hype

The premise

What AI does well here

What AI cannot do

End-of-lesson check

AI tools: vector databases without the hype

The premise

What AI does well here

What AI cannot do

End-of-lesson check