Confessions of a Data Guy

Publication

0 followers

Practitioner’s takes on data engineering practice and careers

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

In a recent Data Engineering Central podcast, Bart Konieczny discussed the evolving synergy between Apache Spark, lakehouse architectures, and artificial intelligence. He highlighted Spark's latest performance enhancements, including Catalyst optimizer refinements and native GPU acceleration. Konieczny explained how lakehouses bridge traditional data warehousing with machine‑learning pipelines, simplifying data governance and real‑time analytics. The conversation also covered the growing open‑source ecosystem that accelerates enterprise adoption of AI‑driven data platforms.

By Confessions of a Data Guy

News•Feb 23, 2026

Introduction to Databricks SQL Temporary Tables

Databricks has introduced session‑scoped temporary tables for Databricks SQL, implemented as physical Delta tables stored in Unity Catalog. The tables persist only for the duration of a Spark SQL session and are automatically reclaimed, supporting full CRUD operations. This addition...

By Confessions of a Data Guy

News•Feb 17, 2026

Temporary Tables in Databricks SQL | Do You Actually Need Them?

The article reviews temporary tables in Databricks SQL, explaining how they store intermediate results for the duration of a session and can be referenced across multiple statements. It compares them to Common Table Expressions, highlighting performance gains when avoiding repeated...

By Confessions of a Data Guy

News•Feb 13, 2026

Migrating to Databricks – A Guide

The guide cautions that moving to Databricks won’t fix weak data fundamentals; organizations must first establish clear dev‑prod separation, version‑controlled code, and cost accountability. It urges teams to define real needs, avoid over‑architecting, and split infrastructure choices from data‑architecture decisions....

By Confessions of a Data Guy

News•Feb 11, 2026

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Spark is evolving from low‑level RDD and notebook‑driven workflows to declarative pipelines, branded as Lakeflow on Databricks. The new framework lets engineers define flows, datasets, and pipelines in a configuration‑first manner, while Spark handles execution for both batch and streaming....

By Confessions of a Data Guy

News•Feb 11, 2026

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

Robin Moffatt discusses how data engineering has shifted from traditional batch processing to real‑time streaming in a recent podcast interview. He outlines the technical drivers—cloud scalability, event‑driven architectures, and low‑latency analytics—that enable continuous data pipelines. Moffatt also highlights emerging tools...

By Confessions of a Data Guy

News•Feb 3, 2026

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

The article introduces the lakehouse architecture as a unified platform that combines the scalability of data lakes with the performance of data warehouses. It highlights how Delta Lake brings ACID transaction support and schema enforcement to open‑source storage, enabling reliable...

By Confessions of a Data Guy

News•Jan 30, 2026

Building Credible Data Systems | Hoyt Emerson on The Full Data Stack

Hoyt Emerson discusses how organizations can construct credible data systems that deliver trustworthy insights. He emphasizes the need for rigorous data governance, automated testing, and clear ownership across the data lifecycle. The conversation highlights real‑world examples where poor data quality...

By Confessions of a Data Guy

News•Jan 30, 2026

Data Engineering Career Path: From Circuits to Pipelines

The article maps a data‑engineering career trajectory that begins with hardware‑oriented roles and ends in building scalable data pipelines. It highlights how circuit‑design thinking translates into logical data modeling, while emphasizing the need to acquire SQL, Python, and cloud‑native tools....

By Confessions of a Data Guy

News•Jan 30, 2026

Apache Airflow vs Databricks Lakeflow | The Orchestration Battle

The article pits Apache Airflow, the open‑source workflow orchestrator, against Databricks Lakeflow, a newer Lakehouse‑native pipeline engine. It outlines core differences in architecture, integration depth with cloud data platforms, and pricing models. Airflow remains favored for heterogeneous environments, while Lakeflow...

By Confessions of a Data Guy

News•Jan 30, 2026

This One Polars Pattern Makes Code 10x Cleaner

The article highlights a single Polars pattern—using the pipe operator—to streamline data‑frame code, cutting boilerplate and boosting readability up to tenfold. By chaining transformations in a lazy execution graph, developers avoid intermediate variables and gain clearer, more maintainable pipelines. The...

By Confessions of a Data Guy

News•Jan 16, 2026

Apache Arrow ADBC Database Drivers

Apache Arrow’s ADBC (Arrow Database Connectivity) introduces a modern, columnar‑native driver that can replace or complement traditional ODBC/JDBC stacks. By moving Arrow RecordBatches end‑to‑end, it eliminates row‑by‑row marshaling and dramatically reduces serialization overhead. Benchmarks show Python ADBC achieving roughly 275 k...

By Confessions of a Data Guy

Technology Pulse

Confessions of a Data Guy

Recent Posts

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

Introduction to Databricks SQL Temporary Tables

Temporary Tables in Databricks SQL | Do You Actually Need Them?

Migrating to Databricks – A Guide

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

Building Credible Data Systems | Hoyt Emerson on The Full Data Stack

Data Engineering Career Path: From Circuits to Pipelines

Apache Airflow vs Databricks Lakeflow | The Orchestration Battle

This One Polars Pattern Makes Code 10x Cleaner

Apache Arrow ADBC Database Drivers

Technology Pulse

Confessions of a Data Guy

Recent Posts

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

Introduction to Databricks SQL Temporary Tables

Temporary Tables in Databricks SQL | Do You Actually Need Them?

Migrating to Databricks – A Guide

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

Building Credible Data Systems | Hoyt Emerson on The Full Data Stack

Data Engineering Career Path: From Circuits to Pipelines

Apache Airflow vs Databricks Lakeflow | The Orchestration Battle

This One Polars Pattern Makes Code 10x Cleaner

Apache Arrow ADBC Database Drivers