Skip to content

Activation Function

Medium — good to knowAI & ML

ELI5 — The Vibe Check

An activation function is the decision gate in a neural network — it decides if a neuron should 'fire' or stay quiet. Without it, a neural network would just be a boring linear equation no matter how deep you make it. ReLU, sigmoid, GELU — these are the gatekeepers that give networks the ability to learn complex, curvy patterns instead of just straight lines.

Real Talk

An activation function introduces non-linearity into neural networks, enabling them to learn complex patterns. Common functions include ReLU (max(0, x)), GELU (used in transformers), sigmoid (logistic), tanh, and SiLU/Swish. The choice of activation affects training dynamics, expressiveness, and gradient flow. Modern architectures primarily use ReLU variants and GELU.

When You'll Hear This

"GELU activations work better than ReLU in transformers." / "The vanishing gradient problem is why we switched from sigmoid to ReLU."

Made with passive-aggressive love by manoga.digital. Powered by Claude.