Batch Normalization
ELI5 — The Vibe Check
Batch normalization is like hitting the reset button on each layer of a neural network so the numbers don't spiral out of control. Without it, training deep networks is like trying to build a tall tower of blocks on a wobbly table — each layer makes the wobbling worse. Batch norm stabilizes each layer so you can build much deeper, more powerful networks.
Real Talk
Batch Normalization normalizes the input to each layer by re-centering and re-scaling across the mini-batch dimension. It reduces internal covariate shift, enabling faster training with higher learning rates, reducing sensitivity to initialization, and providing mild regularization. Applied after linear/conv layers and before activation functions, it's standard in most deep learning architectures.
When You'll Hear This
"Add batch norm between the conv and activation layers." / "Training without batch norm requires much smaller learning rates."
Related Terms
Deep Learning
Deep Learning is Machine Learning that's been hitting the gym.
Gradient Descent
Gradient Descent is how an AI learns — it's the algorithm that nudges the model's weights in the right direction after each mistake.
Layer Normalization
Layer normalization is batch norm's sibling — but instead of normalizing across the batch, it normalizes across the features of each individual example.
Neural Network
A neural network is a system loosely inspired by the human brain — lots of little math nodes connected together, passing numbers to each other.
Training
Training is the long, expensive process where an AI learns from data.