Mixture of Experts
MoE
ELI5 — The Vibe Check
Mixture of Experts is like having a team of specialists instead of one generalist. The model has many 'expert' sub-networks, but for each input, it only activates a few relevant ones. It's like a hospital: when you come in with a broken arm, you see the orthopedic surgeon, not every doctor on staff. This lets you have a massive model that's fast because only a small part runs at once.
Real Talk
Mixture of Experts (MoE) is an architecture where a model consists of multiple 'expert' sub-networks and a gating mechanism that routes each input token to a subset of experts. This enables scaling total model parameters (and capacity) while keeping per-token computation constant. Mixtral 8x7B, for example, has 46.7B total parameters but only activates 12.9B per token.
When You'll Hear This
"Mixtral uses MoE — that's why it's fast despite being huge." / "The MoE architecture gives us GPT-4 quality at a fraction of the inference cost."
Related Terms
Architecture
Architecture is the master blueprint for your app — like deciding whether to build a house, apartment block, or skyscraper before laying a single brick.
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
Mistral
Mistral is the French AI startup that keeps punching above its weight.
Neural Network
A neural network is a system loosely inspired by the human brain — lots of little math nodes connected together, passing numbers to each other.
Transformer
The Transformer is THE architecture behind all modern AI. ChatGPT, Claude, Midjourney, Whisper — all transformers under the hood. The key innovation?