[{"data":1,"prerenderedAt":74},["ShallowReactive",2],{"term-c\u002Fcost-per-token":3,"related-c\u002Fcost-per-token":59},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":44,"meta":45,"navigation":46,"path":47,"related":48,"seo":53,"sitemap":54,"stem":57,"subcategory":6,"__hash__":58},"terms\u002Fterms\u002Fc\u002Fcost-per-token.md","Cost Per Token",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones. Multiply by billions of tokens and your finance team starts paying attention.",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Cost per token is the provider-charged rate for LLM API usage, typically quoted per million tokens for input and output separately. Pricing varies by model tier (flagship > mid > small) and sometimes by feature (cached vs uncached, thinking vs standard). Enterprise pricing often includes committed-use discounts and priority routing.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"Haiku's cost per token is 12x lower than Opus — route accordingly.\" \u002F \"Prompt caching drops cost per token by 90%.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"ai","Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.","beginner","md","c",{},true,"\u002Fterms\u002Fc\u002Fcost-per-token",[49,50,51,52],"Token Burn","Token Tax","Model Routing","Prompt Caching",{"title":5,"description":41},{"changefreq":55,"priority":56},"weekly",0.7,"terms\u002Fc\u002Fcost-per-token","FtC1-6KJYEBgIacP0wD_4nzVvZbqtHWdvw7pejuT-k4",[60,64,68,71],{"title":51,"path":61,"acronym":6,"category":40,"difficulty":62,"description":63},"\u002Fterms\u002Fm\u002Fmodel-routing","advanced","Model routing is dynamically choosing which AI model to call based on task complexity, cost, or latency — the smart switchboard for LLMs.",{"title":52,"path":65,"acronym":6,"category":40,"difficulty":66,"description":67},"\u002Fterms\u002Fp\u002Fprompt-caching","intermediate","Prompt caching is a speed and cost optimization where the AI remembers the beginning of your prompt so it doesn't have to re-process it every time.",{"title":49,"path":69,"acronym":6,"category":40,"difficulty":42,"description":70},"\u002Fterms\u002Ft\u002Ftoken-burn","Token burn is how fast your AI bill climbs because the model keeps re-reading the same context. Every turn of a long chat costs more.",{"title":50,"path":72,"acronym":6,"category":40,"difficulty":42,"description":73},"\u002Fterms\u002Ft\u002Ftoken-tax","Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps.",1776518270536]