Prefix Cache
ELI5 — The Vibe Check
Prefix cache is when an AI provider reuses computation from shared prompt prefixes. If every request starts with the same 10k-token system prompt, they only compute it once. Your requests get cheaper and faster.
Real Talk
Prefix caching is an inference optimization that reuses computed KV cache entries across requests sharing a common prefix. Implemented by most major providers (Anthropic prompt caching, OpenAI prompt caching, Gemini context caching) with explicit APIs. Benefits: large cost savings (up to 90% on cached tokens), reduced time-to-first-token. Requires stable prefixes — minor changes invalidate the cache.
When You'll Hear This
"Prefix caching turned our 50k-token system prompt from a cost problem to a cost advantage." / "Don't vary the system prompt — it kills the prefix cache."
Related Terms
Cost Per Token
Cost per token is how much each token (input or output) costs with a given AI provider. Flagship models cost more per token than cheap ones.
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
KV Cache
KV cache is how LLMs remember previous tokens without recomputing them.
Prompt Caching
Prompt caching is a speed and cost optimization where the AI remembers the beginning of your prompt so it doesn't have to re-process it every time.