What category does Tokens per Second belong to?

Tokens per Second is a AI & ML concept, typically considered beginner difficulty for developers learning this area.

Tokens per Second

TPS

Easy — everyone uses thisAI & ML

ELI5 — The Vibe Check

FPS, but for text instead of frames. Tokens per second measures how fast an AI spits out words. Local LLM users obsess over this number the way gamers obsess over frame rates. 'Only 15 tokens per second? Literally unplayable.' Cloud models are the gaming PCs; your MacBook running Llama is the Nintendo Switch — portable, cute, but don't expect 4K.

Real Talk

Tokens per second is the primary throughput metric for LLM inference, measuring how quickly a model generates output tokens. It varies based on model size, hardware, quantization, batch size, and whether measuring prefill (prompt processing) or decode (generation) speed.

When You'll Hear This

"The quantized model does 80 tokens per second on my M3." / "Claude's TPS is insane on long outputs."

Related Terms

Inference

Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.

intermediateAI & ML

Local LLM

Running an AI model on your own computer instead of calling an API in the cloud. No internet needed, no API costs, total privacy. The tradeoff?

intermediateAI & ML

Quantization

Quantization is the art of making AI models smaller and faster by using less precise numbers.

advancedAI & ML

Token

In AI-land, a token is a chunk of text — roughly 3/4 of a word.

beginnerVibecoding

Back to Browse Random Term