Activation Function
ELI5 — The Vibe Check
An activation function is the decision gate in a neural network — it decides if a neuron should 'fire' or stay quiet. Without it, a neural network would just be a boring linear equation no matter how deep you make it. ReLU, sigmoid, GELU — these are the gatekeepers that give networks the ability to learn complex, curvy patterns instead of just straight lines.
Real Talk
An activation function introduces non-linearity into neural networks, enabling them to learn complex patterns. Common functions include ReLU (max(0, x)), GELU (used in transformers), sigmoid (logistic), tanh, and SiLU/Swish. The choice of activation affects training dynamics, expressiveness, and gradient flow. Modern architectures primarily use ReLU variants and GELU.
When You'll Hear This
"GELU activations work better than ReLU in transformers." / "The vanishing gradient problem is why we switched from sigmoid to ReLU."
Related Terms
Deep Learning
Deep Learning is Machine Learning that's been hitting the gym.
Gradient Descent
Gradient Descent is how an AI learns — it's the algorithm that nudges the model's weights in the right direction after each mistake.
Neural Network
A neural network is a system loosely inspired by the human brain — lots of little math nodes connected together, passing numbers to each other.
Training
Training is the long, expensive process where an AI learns from data.