Token Burn
ELI5 — The Vibe Check
Token burn is how fast your AI bill climbs because the model keeps re-reading the same context. Every turn of a long chat costs more. Every agent retry is token burn. Monitor it or the invoice will surprise you.
Real Talk
Token burn is the rate of token consumption in LLM-based systems, driving API cost. Factors: conversation length (context grows each turn), retries, tool-use loops, and uncached prompts. Mitigations: prompt caching, context compaction, token budgets, and eval-driven prompt optimization. Unmonitored token burn is a common startup cost-overrun.
When You'll Hear This
"Our token burn tripled after the agent rollout — caching saved us." / "Set token-burn alerts before prod."
Related Terms
Cost Per Token
Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.
Prompt Caching
Prompt caching is a speed and cost optimization where the AI remembers the beginning of your prompt so it doesn't have to re-process it every time.
Token Budget
A token budget is the cap on how many tokens a request, session, or user can consume. Like a food budget but for AI.
Token Tax
Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps.