Skip to content

Tokens per Second

TPS

Easy — everyone uses thisAI & ML

ELI5 — The Vibe Check

FPS, but for text instead of frames. Tokens per second measures how fast an AI spits out words. Local LLM users obsess over this number the way gamers obsess over frame rates. 'Only 15 tokens per second? Literally unplayable.' Cloud models are the gaming PCs; your MacBook running Llama is the Nintendo Switch — portable, cute, but don't expect 4K.

Real Talk

Tokens per second is the primary throughput metric for LLM inference, measuring how quickly a model generates output tokens. It varies based on model size, hardware, quantization, batch size, and whether measuring prefill (prompt processing) or decode (generation) speed.

When You'll Hear This

"The quantized model does 80 tokens per second on my M3." / "Claude's TPS is insane on long outputs."

Made with passive-aggressive love by manoga.digital. Powered by Claude.