OCR
Optical Character Recognition
ELI5 — The Vibe Check
OCR reads text from images — take a photo of a document, receipt, or sign, and OCR turns the pixels into actual text your computer can search, copy, and edit. Old OCR was clunky and needed perfect scans. Modern AI-powered OCR can read handwriting, handle weird angles, and work in dozens of languages. It's how your phone turns a photo of a menu into searchable text.
Real Talk
OCR converts images of text into machine-readable text. Modern OCR uses deep learning (CNN + RNN/Transformer architectures) for both detection (finding text regions) and recognition (reading characters). Libraries include Tesseract (open-source), Google Cloud Vision, AWS Textract, and Azure Document Intelligence. Vision-language models like GPT-4V and Claude also perform OCR natively.
When You'll Hear This
"Run OCR on the scanned invoices to extract the data." / "Claude's vision can do OCR better than dedicated tools now."
Related Terms
AI (Artificial Intelligence)
AI is when you teach a computer to do stuff that normally needs a human brain — like recognizing cats, translating languages, or writing code for you.
Computer Vision
Computer Vision is teaching AI to understand images and video. How does your phone unlock with your face? Computer Vision.
Machine Learning (ML)
Machine Learning is teaching a computer by showing it thousands of examples instead of writing out every rule.
Vision Model
A vision model is an AI that can understand images — it's got eyes, basically.