LSM Tree
ELI5 — The Vibe Check
An LSM tree (Log-Structured Merge Tree) is a write-optimized data structure. It buffers writes in memory, then flushes them to disk in sorted chunks. Periodically, it merges these chunks together like organizing a messy pile of papers. It's why databases like Cassandra and RocksDB can handle millions of writes per second.
Real Talk
A Log-Structured Merge Tree (LSM Tree) is a data structure optimized for write-heavy workloads. Writes go to an in-memory buffer (memtable), which is periodically flushed to sorted on-disk files (SSTables). Background compaction merges SSTables to maintain read performance and reclaim space. LSM trees trade read amplification for superior write throughput compared to B-Trees.
When You'll Hear This
"LSM trees sacrifice some read speed for incredible write performance." / "RocksDB, Cassandra, and LevelDB all use LSM trees under the hood."
Related Terms
B-Tree Index
A B-Tree index is the default index type that most databases create when you say CREATE INDEX.
Bloom Filter
A Bloom filter is a tiny data structure that can tell you 'definitely NOT here' or 'maybe here.
Compaction
Compaction is the database's housekeeping process that merges and cleans up files on disk.
Write-Ahead Log
The Write-Ahead Log (WAL) is the database's diary. Before changing any actual data, the database first writes what it's about to do in this log.