Retrieval-Augmented Generation
RAG
ELI5 — The Vibe Check
RAG is like giving the AI an open-book exam instead of expecting it to memorize everything. Before answering your question, it first searches through your documents, knowledge base, or database for relevant information. Then it uses what it found to craft an accurate answer. It's the #1 pattern for building AI apps that need to know about YOUR specific data.
Real Talk
RAG combines information retrieval with language generation. A query is first used to retrieve relevant documents from an external knowledge source (typically via embedding similarity search), then these documents are provided as context to the LLM for generating a grounded response. This reduces hallucinations, enables domain-specific knowledge, and keeps responses up to date without retraining.
Show Me The Code
# Simple RAG pipeline
query = "What's our refund policy?"
# 1. Retrieve relevant documents
embedding = embed(query)
docs = vector_db.search(embedding, top_k=3)
# 2. Generate grounded response
context = "\n".join([d.text for d in docs])
response = llm.chat(f"Context: {context}\n\nQ: {query}")
When You'll Hear This
"RAG lets the chatbot answer questions about our internal docs." / "We moved from fine-tuning to RAG and accuracy went way up."
Related Terms
Embedding
An embedding is turning words, sentences, or entire documents into lists of numbers (vectors) that capture their meaning.
Grounding
Grounding is giving the AI real, verified information to base its answers on — so it doesn't just make stuff up.
LLM (Large Language Model)
An LLM is a humongous AI that read basically the entire internet and learned to predict what words come next, really really well.
Semantic Search
Semantic search finds results based on meaning, not just keyword matching.
Vector Database
A vector database is a special database built to store and search embeddings.