Confessions of a Data Guy

Confessions of a Data Guy

Publication
0 followers

Practitioner’s takes on data engineering practice and careers

5 Steps to Become an AI Engineer (Without the Hype)
NewsMar 27, 2026

5 Steps to Become an AI Engineer (Without the Hype)

The article outlines a pragmatic five‑step roadmap for professionals aiming to become AI engineers, deliberately stripping away industry hype. It emphasizes mastering foundational mathematics, solidifying Python programming skills, building real‑world machine‑learning projects, mastering model deployment and MLOps, and committing to...

By Confessions of a Data Guy
Databricks Metric Views and the Reality of the Semantic Layer
NewsMar 24, 2026

Databricks Metric Views and the Reality of the Semantic Layer

Databricks introduced Metric Views, a Unity Catalog‑based feature that centralizes metric definitions and dimensions. By storing business logic as reusable objects, teams can apply consistent calculations across SQL queries, dashboards, and AI‑driven tools. The YAML‑like syntax makes metrics human‑readable while...

By Confessions of a Data Guy
Agent Bricks and the Commoditization of AI Systems
NewsMar 24, 2026

Agent Bricks and the Commoditization of AI Systems

Databricks launched Agent Bricks, a UI‑driven suite that lets users assemble pre‑built AI agents for tasks such as document parsing, knowledge assistance, and AI‑powered BI. The platform abstracts the complex stack behind retrieval‑augmented generation, turning what once required extensive engineering...

By Confessions of a Data Guy
Polars’ Streaming Engine Is a Bigger Deal Than People Realize
NewsMar 24, 2026

Polars’ Streaming Engine Is a Bigger Deal Than People Realize

Polars' new streaming engine dramatically improves performance, halving runtimes on moderate datasets and delivering up to four‑times speedups on a 12 GB workload compared with eager execution. The library supports eager, lazy, and streaming modes, with lazy enabling predicate pushdown and...

By Confessions of a Data Guy
DuckDB, AI, and the Future of Data Engineering | with Staff Engineer, Matt Martin
NewsMar 21, 2026

DuckDB, AI, and the Future of Data Engineering | with Staff Engineer, Matt Martin

DuckDB is emerging as a mainstream in‑process analytical engine, allowing SQL queries to run directly inside Python, R, or Julia without a separate server. Staff Engineer Matt Martin highlighted how its columnar storage and vectorized execution deliver warehouse‑level performance on...

By Confessions of a Data Guy
Data Engineering, AI, and Career Growth – Podcast Deep Dive with Yuki Kakegawa
NewsMar 11, 2026

Data Engineering, AI, and Career Growth – Podcast Deep Dive with Yuki Kakegawa

In a recent episode of Data Engineering Central, host Daniel interviews AI specialist Yuki Kakegawa to explore how data engineering intersects with artificial intelligence and what professionals need to thrive. Kakegawa highlights the surge in real‑time data pipelines, the rise...

By Confessions of a Data Guy
Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny
NewsFeb 25, 2026

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

In a recent Data Engineering Central podcast, Bart Konieczny discussed the evolving synergy between Apache Spark, lakehouse architectures, and artificial intelligence. He highlighted Spark's latest performance enhancements, including Catalyst optimizer refinements and native GPU acceleration. Konieczny explained how lakehouses bridge...

By Confessions of a Data Guy
Temporary Tables in Databricks SQL | Do You Actually Need Them?
NewsFeb 17, 2026

Temporary Tables in Databricks SQL | Do You Actually Need Them?

The article reviews temporary tables in Databricks SQL, explaining how they store intermediate results for the duration of a session and can be referenced across multiple statements. It compares them to Common Table Expressions, highlighting performance gains when avoiding repeated...

By Confessions of a Data Guy
Migrating to Databricks – A Guide
NewsFeb 13, 2026

Migrating to Databricks – A Guide

The guide cautions that moving to Databricks won’t fix weak data fundamentals; organizations must first establish clear dev‑prod separation, version‑controlled code, and cost accountability. It urges teams to define real needs, avoid over‑architecting, and split infrastructure choices from data‑architecture decisions....

By Confessions of a Data Guy
Why Declarative (Lakeflow) Pipelines Are the Future of Spark
NewsFeb 11, 2026

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Spark is evolving from low‑level RDD and notebook‑driven workflows to declarative pipelines, branded as Lakeflow on Databricks. The new framework lets engineers define flows, datasets, and pipelines in a configuration‑first manner, while Spark handles execution for both batch and streaming....

By Confessions of a Data Guy
Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview
NewsFeb 11, 2026

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

Robin Moffatt discusses how data engineering has shifted from traditional batch processing to real‑time streaming in a recent podcast interview. He outlines the technical drivers—cloud scalability, event‑driven architectures, and low‑latency analytics—that enable continuous data pipelines. Moffatt also highlights emerging tools...

By Confessions of a Data Guy
The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy
NewsFeb 3, 2026

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

The article introduces the lakehouse architecture as a unified platform that combines the scalability of data lakes with the performance of data warehouses. It highlights how Delta Lake brings ACID transaction support and schema enforcement to open‑source storage, enabling reliable...

By Confessions of a Data Guy
Building Credible Data Systems | Hoyt Emerson on The Full Data Stack
NewsJan 30, 2026

Building Credible Data Systems | Hoyt Emerson on The Full Data Stack

Hoyt Emerson discusses how organizations can construct credible data systems that deliver trustworthy insights. He emphasizes the need for rigorous data governance, automated testing, and clear ownership across the data lifecycle. The conversation highlights real‑world examples where poor data quality...

By Confessions of a Data Guy
Data Engineering Career Path: From Circuits to Pipelines
NewsJan 30, 2026

Data Engineering Career Path: From Circuits to Pipelines

The article maps a data‑engineering career trajectory that begins with hardware‑oriented roles and ends in building scalable data pipelines. It highlights how circuit‑design thinking translates into logical data modeling, while emphasizing the need to acquire SQL, Python, and cloud‑native tools....

By Confessions of a Data Guy