[{"data":1,"prerenderedAt":73},["ShallowReactive",2],{"term-t\u002Ftoken-tax":3,"related-t\u002Ftoken-tax":59},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":44,"meta":45,"navigation":46,"path":47,"related":48,"seo":53,"sitemap":54,"stem":57,"subcategory":6,"__hash__":58},"terms\u002Fterms\u002Ft\u002Ftoken-tax.md","Token Tax",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps. If your margin is thin, the token tax eats it.",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Token tax refers to the perpetual operating cost of LLM-based features, charged per-token by providers. Unlike one-time development costs, token tax scales with usage. Optimization strategies include prompt caching, response caching, routing to cheaper models for simple tasks, and running smaller models locally for high-volume low-complexity workloads.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"The token tax is killing our unit economics — we need routing.\" \u002F \"Smaller models handle 80% of queries, cutting the token tax.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"ai","Token tax is the ongoing cost of running AI features in production. Every API call costs tokens. Every request the user makes. It never sleeps.","beginner","md","t",{},true,"\u002Fterms\u002Ft\u002Ftoken-tax",[49,50,51,52],"Token Burn","Token Budget","Cost Per Token","Model Routing",{"title":5,"description":41},{"changefreq":55,"priority":56},"weekly",0.7,"terms\u002Ft\u002Ftoken-tax","nHlSzDmgFMu9ursrgMxedHIatpLLVdYzVHKNOUtzb3I",[60,63,67,70],{"title":51,"path":61,"acronym":6,"category":40,"difficulty":42,"description":62},"\u002Fterms\u002Fc\u002Fcost-per-token","Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.",{"title":52,"path":64,"acronym":6,"category":40,"difficulty":65,"description":66},"\u002Fterms\u002Fm\u002Fmodel-routing","advanced","Model routing is dynamically choosing which AI model to call based on task complexity, cost, or latency — the smart switchboard for LLMs.",{"title":50,"path":68,"acronym":6,"category":40,"difficulty":42,"description":69},"\u002Fterms\u002Ft\u002Ftoken-budget","A token budget is the cap on how many tokens a request, session, or user can consume. Like a food budget but for AI.",{"title":49,"path":71,"acronym":6,"category":40,"difficulty":42,"description":72},"\u002Fterms\u002Ft\u002Ftoken-burn","Token burn is how fast your AI bill climbs because the model keeps re-reading the same context. Every turn of a long chat costs more.",1776518319410]