Embedding
ELI5 — The Vibe Check
An embedding is turning words, sentences, or entire documents into lists of numbers (vectors) that capture their meaning. The magic is that similar meanings end up as similar numbers — so 'dog' and 'puppy' will be close together in number-space while 'dog' and 'democracy' will be far apart. It's how AI understands meaning.
Real Talk
An embedding is a dense, fixed-dimensional vector representation of discrete data (text, images, etc.) learned by a neural network. In the embedding space, semantic similarity corresponds to vector proximity, enabling operations like similarity search and clustering. Text embeddings are produced by encoder models and used for retrieval, classification, and RAG.
Show Me The Code
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
input="The quick brown fox",
model="text-embedding-3-small"
)
vector = response.data[0].embedding # list of 1536 floats
When You'll Hear This
"Generate an embedding for each document, then compare cosine similarity." / "Embeddings power our semantic search."
Related Terms
RAG (Retrieval Augmented Generation)
RAG is how you give an AI access to your private documents without retraining it.
Semantic Search
Semantic search finds results based on meaning, not just keyword matching.
Tokenizer
A tokenizer chops text into pieces that the AI model can understand — but not in ways humans would expect.
Vector
In AI, a vector is just a list of numbers. But it's a list of numbers that means something — like [0.23, -0.91, 0.44, ...
Vector Database
A vector database is a special database built to store and search embeddings.