Compaction
ELI5 — The Vibe Check
Compaction is the database's housekeeping process that merges and cleans up files on disk. In LSM trees, it combines multiple sorted files into bigger sorted files and throws away deleted data. Without compaction, reads get slower and disk usage balloons. It's like periodically organizing your desk instead of just adding more piles.
Real Talk
Compaction is the background process in LSM-tree databases that merges multiple sorted files (SSTables) to reduce read amplification, reclaim space from deleted/overwritten data, and maintain query performance. Compaction strategies include size-tiered (merge similarly-sized files), leveled (maintain size-bounded levels), and FIFO (time-based expiration). Compaction consumes I/O and CPU, requiring careful tuning.
When You'll Hear This
"Compaction spikes caused latency jitter until we tuned the strategy." / "Leveled compaction gives better read performance but uses more I/O."
Related Terms
Bloom Filter
A Bloom filter is a tiny data structure that can tell you 'definitely NOT here' or 'maybe here.
Cassandra
Cassandra is like a massive library system spread across every city in the world.
LSM Tree
An LSM tree (Log-Structured Merge Tree) is a write-optimized data structure. It buffers writes in memory, then flushes them to disk in sorted chunks.
Write-Ahead Log
The Write-Ahead Log (WAL) is the database's diary. Before changing any actual data, the database first writes what it's about to do in this log.