Skip to content

UTF-8

UTF-8

Medium — good to knowGeneral Dev

ELI5 — The Vibe Check

UTF-8 is the most popular way to turn Unicode characters into actual bytes on disk. It's clever — English letters take 1 byte, exotic characters take more. It's backwards-compatible with plain ASCII, which is why it took over the world.

Real Talk

UTF-8 is a variable-width character encoding that represents Unicode code points using 1 to 4 bytes per character. It is the dominant encoding on the web and in most modern systems due to its ASCII compatibility and space efficiency for Latin scripts.

Show Me The Code

# Python — encoding and decoding UTF-8
text = "Hello 😀"
bytes_data = text.encode("utf-8")
print(bytes_data)  # b'Hello \xf0\x9f\x98\x80'
print(bytes_data.decode("utf-8"))  # Hello 😀

When You'll Hear This

"Always save files as UTF-8." / "The API response had garbled characters — it was encoded as Latin-1, not UTF-8."

Made with passive-aggressive love by manoga.digital. Powered by Claude.