Distillation
ELI5 — The Vibe Check
Teaching a small AI by having it copy a big AI's homework. You run the big smart model on thousands of examples, record its answers, then train a tiny model to give the same answers. The small model ends up surprisingly smart for its size — like a student who studied from the professor's answer key.
Real Talk
Knowledge distillation is a model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model. The student learns from the teacher's output probability distributions rather than raw labels, capturing richer information about the task. It enables deploying capable models on resource-constrained devices.
When You'll Hear This
"Distill the 70B model into a 7B one for production." / "The distilled model is 10x cheaper with 90% of the quality."
Related Terms
Fine-tuning
Fine-tuning is like taking a smart graduate student who knows everything and then sending them to a specialist bootcamp.
Model
A model is the trained AI — the finished product.
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.
Training
Training is the long, expensive process where an AI learns from data.