[{"data":1,"prerenderedAt":79},["ShallowReactive",2],{"term-i\u002Finference":3,"related-i\u002Finference":60},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":44,"meta":45,"navigation":46,"path":47,"related":48,"seo":54,"sitemap":55,"stem":58,"subcategory":6,"__hash__":59},"terms\u002Fterms\u002Fi\u002Finference.md","Inference",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning. When you type a prompt and hit enter, that's inference. Training is the expensive months-long process; inference is the moment-to-moment work of generating answers. Inference costs are what your API bill is.",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Inference is the process of running a trained model on new input to generate predictions or outputs. For LLMs, inference involves a forward pass through the network for each generated token. Inference speed and cost are determined by model size, hardware (GPU\u002FTPU), batching, and optimization techniques like quantization and KV caching.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"Inference latency is too high — users see a 5 second delay.\" \u002F \"We run inference on A100 GPUs.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"ai","Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.","intermediate","md","i",{},true,"\u002Fterms\u002Fi\u002Finference",[49,50,51,52,53],"Training","Model","Token","Temperature","GPU",{"title":5,"description":41},{"changefreq":56,"priority":57},"weekly",0.7,"terms\u002Fi\u002Finference","YZUUZMBliMqHt0hyVNfYq49ftfCXpOOK7BOJIq0WKsY",[61,66,69,72,76],{"title":53,"path":62,"acronym":63,"category":40,"difficulty":64,"description":65},"\u002Fterms\u002Fg\u002Fgpu","Graphics Processing Unit","beginner","A GPU was originally built for rendering graphics in games, but turns out it's also perfect for AI.",{"title":50,"path":67,"acronym":6,"category":40,"difficulty":64,"description":68},"\u002Fterms\u002Fm\u002Fmodel","A model is the trained AI — the finished product.",{"title":52,"path":70,"acronym":6,"category":40,"difficulty":42,"description":71},"\u002Fterms\u002Ft\u002Ftemperature","Temperature controls how creative (or chaotic) an AI's responses are. Low temperature (like 0.1) makes it boring, safe, and predictable — great for code.",{"title":51,"path":73,"acronym":6,"category":74,"difficulty":64,"description":75},"\u002Fterms\u002Ft\u002Ftoken","vibecoding","In AI-land, a token is a chunk of text — roughly 3\u002F4 of a word.",{"title":49,"path":77,"acronym":6,"category":40,"difficulty":42,"description":78},"\u002Fterms\u002Ft\u002Ftraining","Training is the long, expensive process where an AI learns from data.",1776518288653]