
Master Dimensional Modeling Lesson 03 - Understand the ETL Pipeline
In this tutorial Brian Kafki steps back from dimensional modeling to outline the full data‑warehouse ETL pipeline, from source systems through raw ingestion, pre‑staging, staging, an operational data store (ODS) snapshot, and finally the data mart that powers BI tools. He maps each traditional stage to the popular Medallion design pattern—bronze for raw landing, silver for curated snapshots, and gold for the final star‑schema model—while noting that Unity Catalog’s three‑level hierarchy forces a double‑use of the bronze layer. The video also stresses incremental loading techniques, using timestamp filters or change‑data‑capture, to avoid full reloads and reduce load‑time overhead. Kafki illustrates the flow with concrete examples: finance tables from Oracle, sales data from Salesforce, HR records, and web‑derived engagement metrics. He points out that bronze data is never queried directly, silver may serve power users, and gold is the business‑ready layer. He also references the evolution of Delta Lake, which only became mainstream after its open‑source release around 2020. Understanding these layers helps architects design resilient pipelines, preserve source‑system performance, and deliver clean, query‑ready data to analysts. The framework also clarifies governance boundaries, making it easier to scale ETL processes as data volumes and source diversity grow.

Master Databricks 2nd Ed: Lesson 4 - Use Databricks for Free!
Databricks has launched a free, no‑credit‑card edition aimed at students and professionals seeking hands‑on experience with its cloud‑based data platform. The environment runs on AWS, mirrors the standard UI, and bundles introductory videos, notebooks, and a Unity catalog, allowing users...