Cost Per Token
ELI5 — The Vibe Check
Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones. Multiply by billions of tokens and your finance team starts paying attention.
Real Talk
Cost per token is the provider-charged rate for LLM API usage, typically quoted per million tokens for input and output separately. Pricing varies by model tier (flagship > mid > small) and sometimes by feature (cached vs uncached, thinking vs standard). Enterprise pricing often includes committed-use discounts and priority routing.
When You'll Hear This
"Haiku's cost per token is 12x lower than Opus — route accordingly." / "Prompt caching drops cost per token by 90%."
Related Terms
Model Routing
Model routing is dynamically choosing which AI model to call based on task complexity, cost, or latency — the smart switchboard for LLMs.
Prompt Caching
Prompt caching is a speed and cost optimization where the AI remembers the beginning of your prompt so it doesn't have to re-process it every time.
Token Burn
Token burn is how fast your AI bill climbs because the model keeps re-reading the same context. Every turn of a long chat costs more.
Token Tax
Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps.