Self-Attention
ELI5 — The Vibe Check
Self-attention is how a model looks at a sentence and figures out which words are most important to each other. When you read 'The cat sat on the mat because it was tired,' self-attention is what helps the model know that 'it' refers to 'the cat,' not 'the mat.' Every word gets to 'look at' every other word and decide how much to care about it.
Real Talk
Self-attention is a mechanism where each element in a sequence computes attention weights over all other elements in the same sequence. Each position produces query, key, and value vectors — attention scores are computed as scaled dot products of queries and keys, then used to weight values. It's the core operation of transformer architectures, enabling parallel processing of sequence data.
When You'll Hear This
"Self-attention captures long-range dependencies better than RNNs." / "The self-attention maps show the model is focusing on the right tokens."
Related Terms
Attention Mechanism
The attention mechanism is how AI decides what to focus on — like when you're reading a long email and your eyes jump to the part that mentions your name.
Deep Learning
Deep Learning is Machine Learning that's been hitting the gym.
Multi-Head Attention
Multi-head attention is running multiple attention mechanisms in parallel — like having several detectives investigate the same crime scene but looking for...
Neural Network
A neural network is a system loosely inspired by the human brain — lots of little math nodes connected together, passing numbers to each other.
Transformer
The Transformer is THE architecture behind all modern AI. ChatGPT, Claude, Midjourney, Whisper — all transformers under the hood. The key innovation?