Token Budget
ELI5 — The Vibe Check
A token budget is the cap on how many tokens a request, session, or user can consume. Like a food budget but for AI. Without budgets, one runaway agent can rack up thousands of dollars in an hour.
Real Talk
A token budget is an enforced limit on token consumption per request, session, user, or time window. Implemented at the application layer or via API rate limits. Critical for multi-tenant AI products to prevent cost explosions from single users or runaway agents. Typically paired with graceful degradation when the budget is exhausted.
When You'll Hear This
"Set a 50k token budget per session and enforce it." / "Token budgets prevent one bad actor from bankrupting the service."
Related Terms
Cost Per Token
Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.
Rate Limiting
Rate limiting is like a bouncer who says 'you can come in 100 times per hour, then you wait.
Token Burn
Token burn is how fast your AI bill climbs because the model keeps re-reading the same context. Every turn of a long chat costs more.
Token Tax
Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps.