Sora
ELI5 — The Vibe Check
Sora is OpenAI's text-to-video model — you type 'a cat riding a skateboard through Tokyo at sunset' and it generates an actual video of that. It's simultaneously amazing and terrifying. One day we'll look back and laugh at how janky early Sora videos were, like we laugh at 90s CGI now. But right now? Mind = blown.
Real Talk
Sora is OpenAI's text-to-video generative model that creates realistic and imaginative video clips from text descriptions. It uses a diffusion transformer architecture operating on spacetime patches, enabling it to generate videos with complex motion, multiple characters, and coherent scene transitions. It can also extend, edit, and interpolate between existing videos.
When You'll Hear This
"Sora-generated that b-roll for our marketing video." / "When Sora can do 4K 60fps reliably, stock video is dead."
Related Terms
Diffusion Model
Diffusion models generate images by learning to reverse noise. In training, you take an image and slowly add random noise until it's pure static.
Generative AI
Generative AI is AI that creates new stuff — text, images, code, music, video — rather than just classifying or predicting. ChatGPT writes essays.
Multimodal
Multimodal AI can see, hear, AND read — it's not limited to just text. It's like the difference between texting someone and FaceTiming them.
OpenAI
OpenAI is the company behind ChatGPT, GPT-4, DALL-E, and Codex.
Text-to-Speech (TTS)
Text-to-Speech takes written words and reads them out loud with a computer voice. Old TTS sounded like a robot reading a phone book.