Unicode
ELI5 — The Vibe Check
Unicode is the master list of every character ever invented by humans — letters, numbers, emojis, ancient Sumerian cuneiform, all of it. Every character gets a unique number (code point) so computers worldwide can agree on what 'A' or '😀' means.
Real Talk
Unicode is an international standard that assigns a unique code point to every character across all writing systems. It covers over 140,000 characters. UTF-8, UTF-16, and UTF-32 are encoding schemes that represent these code points as bytes.
Show Me The Code
console.log("Hello \u{1F600}"); // Hello 😀
console.log("A".codePointAt(0)); // 65
console.log(String.fromCodePoint(9731)); // ☃
When You'll Hear This
"Make sure the database column uses UTF-8 to support Unicode characters." / "That emoji broke the parser — Unicode edge case."
Related Terms
Base64
Base64 is like translating a photo into text by converting it to a long string of letters and numbers.
Regex (Regex)
Regex is a secret language for describing patterns in text.
String
A string is text in programming — any sequence of characters wrapped in quotes. 'Hello', 'user@email.com', '12345' — if it is in quotes, it is a string.
UTF-8 (UTF-8)
UTF-8 is the most popular way to turn Unicode characters into actual bytes on disk. It's clever — English letters take 1 byte, exotic characters take more.