ETL
ELI5 — The Vibe Check
ETL stands for Extract, Transform, Load. You extract data from sources, transform it (clean, reshape, calculate), then load it into your warehouse. It's like collecting ingredients, prepping them, and putting them in the fridge organized. The old-school way of getting data ready for analysis.
Real Talk
ETL (Extract, Transform, Load) is a data integration process where data is extracted from source systems, transformed (cleaned, validated, denormalized, aggregated) in a staging area, and loaded into a target data warehouse. Traditional ETL transforms data before loading, requiring upfront schema design. Tools include Apache Airflow, dbt, Informatica, and Talend.
When You'll Hear This
"Our ETL pipeline runs nightly to refresh the warehouse." / "ETL transforms data before loading so the warehouse stays clean."
Related Terms
Data Lake
A data lake is a massive storage dump where you throw every piece of data in its raw format. CSV files, JSON, images, logs, whatever.
Data Warehouse
A data warehouse is where all your company's data goes to be analyzed.
ELT
ELT is ETL's modern cousin. Instead of transforming data before loading it, you dump the raw data into your warehouse first, then use the warehouse's beefy...
Star Schema
A star schema organizes your warehouse like a star.