Bloom Filter
ELI5 — The Vibe Check
A Bloom filter is a tiny data structure that can tell you 'definitely NOT here' or 'maybe here.' It's like a bouncer with a guest list who occasionally lets in people not on the list but never turns away someone who is. Databases use it to avoid expensive disk reads for keys that don't exist.
Real Talk
A Bloom filter is a probabilistic data structure that tests set membership with no false negatives but possible false positives. It uses multiple hash functions mapping elements to a bit array. Databases use Bloom filters to skip unnecessary disk reads in LSM trees, reduce network calls in distributed systems, and optimize cache lookups. The false positive rate is tunable via filter size.
When You'll Hear This
"The Bloom filter saves us from reading disk for keys that don't exist." / "Cassandra uses Bloom filters to decide which SSTables might contain a key."
Related Terms
B-Tree Index
A B-Tree index is the default index type that most databases create when you say CREATE INDEX.
Compaction
Compaction is the database's housekeeping process that merges and cleans up files on disk.
Key-Value Store
A key-value store is the simplest database possible. You give it a name (key) and some data (value), and it remembers it.
LSM Tree
An LSM tree (Log-Structured Merge Tree) is a write-optimized data structure. It buffers writes in memory, then flushes them to disk in sorted chunks.