Vector Database
ELI5 — The Vibe Check
A vector database is a special database built to store and search embeddings. Normal databases are bad at 'find me the 10 most similar items' — they'd have to check everything. Vector databases use clever indexing (like HNSW or IVF) to find similar vectors super fast. They're the backbone of AI-powered search and RAG systems.
Real Talk
Vector databases store high-dimensional embedding vectors and provide efficient approximate nearest neighbor (ANN) search. They support similarity queries at scale using indexing algorithms like HNSW, IVF-PQ, and FAISS. Popular options include Pinecone, Weaviate, Qdrant, Chroma, and pgvector (Postgres extension).
Show Me The Code
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
documents=["The cat sat on the mat"],
embeddings=[[0.1, 0.2, 0.3]],
ids=["doc1"]
)
results = collection.query(query_embeddings=[[0.1, 0.2, 0.35]], n_results=3)
When You'll Hear This
"Store all our support docs in a vector database for RAG." / "Query the vector database for the most relevant chunks."
Related Terms
Embedding
An embedding is turning words, sentences, or entire documents into lists of numbers (vectors) that capture their meaning.
RAG (Retrieval Augmented Generation)
RAG is how you give an AI access to your private documents without retraining it.
Semantic Search
Semantic search finds results based on meaning, not just keyword matching.
Vector
In AI, a vector is just a list of numbers. But it's a list of numbers that means something — like [0.23, -0.91, 0.44, ...