Whisper
ELI5 — The Vibe Check
Whisper is OpenAI's speech recognition model — it listens to audio and writes down what was said. It's like having the world's best transcriptionist who speaks 99 languages and never gets tired. You give it a podcast, a meeting recording, or a mumbled voice memo and it turns it into text. It's open source too, so you can run it yourself.
Real Talk
Whisper is OpenAI's open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual audio data. It handles transcription, translation, and language identification across 99 languages. It's robust to background noise, accents, and technical jargon. Available in multiple sizes (tiny to large) with different accuracy/speed tradeoffs.
Show Me The Code
# Transcribe audio with Whisper
import whisper
model = whisper.load_model("base")
result = model.transcribe("meeting.mp3")
print(result["text"])
When You'll Hear This
"Run the meeting recording through Whisper for the transcript." / "Whisper handles accented English way better than Google's STT."
Related Terms
NLP (Natural Language Processing)
NLP is the branch of AI that deals with human language — reading it, writing it, translating it, summarizing it.
OpenAI
OpenAI is the company behind ChatGPT, GPT-4, DALL-E, and Codex.
Text-to-Speech (TTS)
Text-to-Speech takes written words and reads them out loud with a computer voice. Old TTS sounded like a robot reading a phone book.