[{"data":1,"prerenderedAt":73},["ShallowReactive",2],{"term-p\u002Fprefix-cache":3,"related-p\u002Fprefix-cache":58},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":16,"meta":44,"navigation":45,"path":46,"related":47,"seo":52,"sitemap":53,"stem":56,"subcategory":6,"__hash__":57},"terms\u002Fterms\u002Fp\u002Fprefix-cache.md","Prefix Cache",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Prefix cache is when an AI provider reuses computation from shared prompt prefixes. If every request starts with the same 10k-token system prompt, they only compute it once. Your requests get cheaper and faster.",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Prefix caching is an inference optimization that reuses computed KV cache entries across requests sharing a common prefix. Implemented by most major providers (Anthropic prompt caching, OpenAI prompt caching, Gemini context caching) with explicit APIs. Benefits: large cost savings (up to 90% on cached tokens), reduced time-to-first-token. Requires stable prefixes — minor changes invalidate the cache.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"Prefix caching turned our 50k-token system prompt from a cost problem to a cost advantage.\" \u002F \"Don't vary the system prompt — it kills the prefix cache.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"ai","Prefix cache is when an AI provider reuses computation from shared prompt prefixes.","advanced","md",{},true,"\u002Fterms\u002Fp\u002Fprefix-cache",[48,49,50,51],"Prompt Caching","KV Cache","Cost Per Token","Inference",{"title":5,"description":41},{"changefreq":54,"priority":55},"weekly",0.7,"terms\u002Fp\u002Fprefix-cache","gIwZjBpiN578Jqix4FTZFE_DBVjjmMi309K_qQDArOE",[59,63,67,70],{"title":50,"path":60,"acronym":6,"category":40,"difficulty":61,"description":62},"\u002Fterms\u002Fc\u002Fcost-per-token","beginner","Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.",{"title":51,"path":64,"acronym":6,"category":40,"difficulty":65,"description":66},"\u002Fterms\u002Fi\u002Finference","intermediate","Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.",{"title":49,"path":68,"acronym":6,"category":40,"difficulty":42,"description":69},"\u002Fterms\u002Fk\u002Fkv-cache","KV cache is how LLMs remember previous tokens without recomputing them.",{"title":48,"path":71,"acronym":6,"category":40,"difficulty":65,"description":72},"\u002Fterms\u002Fp\u002Fprompt-caching","Prompt caching is a speed and cost optimization where the AI remembers the beginning of your prompt so it doesn't have to re-process it every time.",1776518302981]