Context Budget
ELI5 — The Vibe Check
Context budget is how you allocate tokens across system prompt, memory, tools, and conversation — every token is a dollar at scale. You've got 200k tokens to spend per call. Your system prompt eats 5k, tool definitions another 10k, conversation history is 80k. What's left for actual user content? Context budget is the discipline of making those numbers work before you hit the limit or the invoice.
Real Talk
Context budget management involves optimizing the distribution of a model's finite context window across competing inputs: system instructions, retrieved documents, tool schemas, chat history, and the current user message. At scale, poor context budgeting means expensive calls with irrelevant content, or truncated history causing the model to forget earlier conversation. Techniques include dynamic summarization, RAG for selective retrieval, and tool schema minimization.
When You'll Hear This
"We blew the context budget with full chat history — switching to rolling summarization." / "Model routing helps, but the real win was fixing the context budget allocation."
Related Terms
Context Window
A context window is how much text an AI can 'see' at once — its working memory.
Prompt Engineering
Prompt engineering is the art of talking to AI so it actually does what you want.
Token
In AI-land, a token is a chunk of text — roughly 3/4 of a word.