ETL: The Central Abstraction
Behind every dashboard you've ever seen, every "daily KPI" email an executive receives, every training dataset that mysteriously appears in your team's S3 bucket — there is a pipeline. Almost always, that pipeline follows the same three-step pattern.
ETL stands for Extract, Transform, Load.
Every data pipeline — from a nightly dashboard refresh to 10 billion sensor events per day — is a version of ETL. The core operations stay the same; the tools change when the scale changes.
ELT: The Warehouse-First Variant
As data warehouses became more powerful (and cheaper per query), a variation emerged: ELT — Extract, Load, then Transform. You dump raw data into the warehouse first and run transformations inside the warehouse using SQL. Snowflake and BigQuery both encourage this pattern. Tools like dbt (data build tool) have made it enormously popular.
The trade-off: ELT is simpler to operate — you're writing SQL, not Python pipeline code — but harder to keep clean. The "raw" layer can sprawl if nobody enforces standards. In practice, most modern teams run ELT for structured sources and ETL for heavier transformations on unstructured data.
A team decides to use ELT instead of ETL for a new analytics pipeline. They load raw JSON from their API into BigQuery, then run SQL transformations to clean and aggregate it. What is the most significant trade-off they've made?