Columnar Storage
ELI5 — The Vibe Check
Columnar storage saves data column by column instead of row by row. All the ages together, all the names together, all the emails together. This is amazing for analytics because you can skip entire columns you don't need and compress similar values together. It's why your analytics query over 1 billion rows finishes in 2 seconds.
Real Talk
Columnar storage organizes data by column rather than by row on disk. Each column is stored contiguously, enabling efficient compression of similar data types and minimal I/O for queries accessing only specific columns. It's the foundation of analytical databases like Parquet files, BigQuery, Redshift, and ClickHouse. Row-based storage remains superior for OLTP workloads.
When You'll Hear This
"Columnar storage makes our analytical queries 100x faster by only reading the columns we need." / "Parquet is the standard columnar file format in the data ecosystem."
Related Terms
Column Store
Instead of storing all of a row's data together (name, age, email), a column store keeps all the names together, all the ages together, and all the emails...
Data Lakehouse
A data lakehouse is what you get when a data lake and a data warehouse have a baby.
Data Warehouse
A data warehouse is where all your company's data goes to be analyzed.
OLAP
OLAP is all about analyzing huge amounts of data to answer business questions. 'What were total sales by region last quarter?' That's an OLAP query.