Transformer
ELI5 — The Vibe Check
The Transformer is THE architecture behind all modern AI. ChatGPT, Claude, Midjourney, Whisper — all transformers under the hood. The key innovation? The attention mechanism that lets the model look at all parts of the input at once, instead of one word at a time. The 2017 paper 'Attention Is All You Need' is probably the most impactful paper in the history of AI.
Real Talk
The Transformer is a neural network architecture introduced in 'Attention Is All You Need' (Vaswani et al., 2017). It uses self-attention mechanisms to process entire sequences in parallel, replacing recurrent and convolutional architectures. Key components include multi-head self-attention, positional encoding, layer normalization, and feed-forward networks. It's the foundation of all modern LLMs and many vision and audio models.
When You'll Hear This
"Every modern LLM is based on the Transformer architecture." / "Transformers parallelized sequence processing and changed everything."
Related Terms
Attention Mechanism
The attention mechanism is how AI decides what to focus on — like when you're reading a long email and your eyes jump to the part that mentions your name.
Deep Learning
Deep Learning is Machine Learning that's been hitting the gym.
LLM (Large Language Model)
An LLM is a humongous AI that read basically the entire internet and learned to predict what words come next, really really well.
Neural Network
A neural network is a system loosely inspired by the human brain — lots of little math nodes connected together, passing numbers to each other.
Self-Attention
Self-attention is how a model looks at a sentence and figures out which words are most important to each other.