Data Augmentation
ELI5 — The Vibe Check
Data augmentation is making your training data go further by creating variations of what you already have. Flip images, rotate them, add noise, change colors — now your 1,000 photos look like 10,000. For text, you can rephrase sentences, swap synonyms, or translate back and forth. It's the AI equivalent of stretching your grocery budget with creative leftovers.
Real Talk
Data augmentation artificially expands training datasets by applying transformations that preserve label validity. For images: rotation, flipping, cropping, color jittering. For text: back-translation, synonym replacement, random insertion/deletion. For audio: time stretching, pitch shifting, noise injection. It improves model generalization, reduces overfitting, and is especially valuable when labeled data is scarce.
When You'll Hear This
"Data augmentation doubled our effective training set size." / "The model was overfitting until we added augmentation."
Related Terms
Model
A model is the trained AI — the finished product.
Overfitting
Overfitting is when your model gets TOO good at the training data and becomes useless on new data.
Synthetic Data
Synthetic data is fake data that's good enough to train real models.
Training
Training is the long, expensive process where an AI learns from data.