Knowledge Distillation
ELI5 — The Vibe Check
Knowledge distillation is teaching a small model to mimic a big model — like having a genius tutor teach a regular student. The big 'teacher' model is too expensive to run in production, so you train a smaller 'student' model to produce similar outputs. The student will never be as smart, but it gets surprisingly close while being 10x faster and cheaper.
Real Talk
Knowledge distillation is a model compression technique where a smaller 'student' model is trained to reproduce the behavior of a larger 'teacher' model. The student learns from the teacher's soft probability distributions (logits) rather than hard labels, capturing richer information about inter-class relationships. This enables deploying capable models in resource-constrained environments.
When You'll Hear This
"We distilled GPT-4's knowledge into a smaller model for edge deployment." / "Knowledge distillation got us 90% of the big model's performance at 10% of the cost."
Related Terms
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
Model
A model is the trained AI — the finished product.
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.
Training
Training is the long, expensive process where an AI learns from data.