UTF-8
UTF-8
ELI5 — The Vibe Check
UTF-8 is the most popular way to turn Unicode characters into actual bytes on disk. It's clever — English letters take 1 byte, exotic characters take more. It's backwards-compatible with plain ASCII, which is why it took over the world.
Real Talk
UTF-8 is a variable-width character encoding that represents Unicode code points using 1 to 4 bytes per character. It is the dominant encoding on the web and in most modern systems due to its ASCII compatibility and space efficiency for Latin scripts.
Show Me The Code
# Python — encoding and decoding UTF-8
text = "Hello 😀"
bytes_data = text.encode("utf-8")
print(bytes_data) # b'Hello \xf0\x9f\x98\x80'
print(bytes_data.decode("utf-8")) # Hello 😀
When You'll Hear This
"Always save files as UTF-8." / "The API response had garbled characters — it was encoded as Latin-1, not UTF-8."
Related Terms
Base64
Base64 is like translating a photo into text by converting it to a long string of letters and numbers.
Regex (Regex)
Regex is a secret language for describing patterns in text.
Unicode
Unicode is the master list of every character ever invented by humans — letters, numbers, emojis, ancient Sumerian cuneiform, all of it.