Tokens per Second
TPS
ELI5 — The Vibe Check
FPS, but for text instead of frames. Tokens per second measures how fast an AI spits out words. Local LLM users obsess over this number the way gamers obsess over frame rates. 'Only 15 tokens per second? Literally unplayable.' Cloud models are the gaming PCs; your MacBook running Llama is the Nintendo Switch — portable, cute, but don't expect 4K.
Real Talk
Tokens per second is the primary throughput metric for LLM inference, measuring how quickly a model generates output tokens. It varies based on model size, hardware, quantization, batch size, and whether measuring prefill (prompt processing) or decode (generation) speed.
When You'll Hear This
"The quantized model does 80 tokens per second on my M3." / "Claude's TPS is insane on long outputs."
Related Terms
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
Local LLM
Running an AI model on your own computer instead of calling an API in the cloud. No internet needed, no API costs, total privacy. The tradeoff?
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.
Token
In AI-land, a token is a chunk of text — roughly 3/4 of a word.