Diffusion Model
ELI5 — The Vibe Check
Diffusion models generate images by learning to reverse noise. In training, you take an image and slowly add random noise until it's pure static. The model learns to reverse this — given static, reconstruct the original. At generation time, start with pure noise, and the model gradually removes noise until a coherent image appears. That's how Stable Diffusion and DALL-E work.
Real Talk
Diffusion models are generative models that learn to reverse a noise diffusion process. During training, Gaussian noise is progressively added to data (forward process). A neural network (U-Net or transformer) learns to predict and remove the noise (reverse process). At inference time, samples are generated by iteratively denoising from pure Gaussian noise. Stable Diffusion, DALL-E 3, and Sora are based on this approach.
When You'll Hear This
"Stable Diffusion is a diffusion model for image generation." / "Diffusion models outperformed GANs on image quality benchmarks."
Related Terms
Computer Vision
Computer Vision is teaching AI to understand images and video. How does your phone unlock with your face? Computer Vision.
GAN (Generative Adversarial Network)
A GAN is two neural networks fighting each other. One (the Generator) tries to create fake images that look real.
Generative AI
Generative AI is AI that creates new stuff — text, images, code, music, video — rather than just classifying or predicting. ChatGPT writes essays.
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
Training
Training is the long, expensive process where an AI learns from data.