Confessions of a Data Guy - Latest News and Information

All News Deals Social Blogs Videos Podcasts Digests

Confessions of a Data Guy

Publication

0 followers

Practitioner’s takes on data engineering practice and careers

Databricks Zerobus Streaming Ingestion for Delta Lake House

News•May 23, 2026

Databricks Zerobus Streaming Ingestion for Delta Lake House

Databricks introduced Zerobus, a high‑throughput streaming service that writes data directly into Delta Lake tables, removing the need for external message buses like Kafka. The Python SDK (and others for Rust, Go, TypeScript, Java) lets developers stream Apache Arrow RecordBatches with minimal code. By integrating tightly with the lakehouse architecture, Zerobus simplifies real‑time pipelines and cuts infrastructure complexity. Early tests show the solution can handle enterprise‑scale workloads with lower latency and operational overhead.

By Confessions of a Data Guy

Delta Lake and Databricks Expert – Inside Look

News•Apr 30, 2026

Delta Lake and Databricks Expert – Inside Look

The article profiles a leading Delta Lake and Databricks expert, highlighting the rapid adoption of the Lakehouse architecture across enterprises. It notes a 45% year‑over‑year increase in Delta Lake deployments in 2025 and Databricks’ Lakehouse revenue reaching roughly $2.5 billion. The...

By Confessions of a Data Guy

The Data Engineering Revolution | Spark, AI, and What’s Coming Next

News•Mar 27, 2026

The Data Engineering Revolution | Spark, AI, and What’s Coming Next

The article outlines how Apache Spark has become the backbone of modern data engineering, driving real‑time analytics and large‑scale ETL workloads. It highlights the infusion of generative AI models into pipeline orchestration, enabling automated schema evolution and anomaly detection. Recent...

By Confessions of a Data Guy

5 Steps to Become an AI Engineer (Without the Hype)

News•Mar 27, 2026

5 Steps to Become an AI Engineer (Without the Hype)

The article outlines a pragmatic five‑step roadmap for professionals aiming to become AI engineers, deliberately stripping away industry hype. It emphasizes mastering foundational mathematics, solidifying Python programming skills, building real‑world machine‑learning projects, mastering model deployment and MLOps, and committing to...

By Confessions of a Data Guy

Databricks Metric Views and the Reality of the Semantic Layer

News•Mar 24, 2026

Databricks Metric Views and the Reality of the Semantic Layer

Databricks introduced Metric Views, a Unity Catalog‑based feature that centralizes metric definitions and dimensions. By storing business logic as reusable objects, teams can apply consistent calculations across SQL queries, dashboards, and AI‑driven tools. The YAML‑like syntax makes metrics human‑readable while...

By Confessions of a Data Guy

Agent Bricks and the Commoditization of AI Systems

News•Mar 24, 2026

Agent Bricks and the Commoditization of AI Systems

Databricks launched Agent Bricks, a UI‑driven suite that lets users assemble pre‑built AI agents for tasks such as document parsing, knowledge assistance, and AI‑powered BI. The platform abstracts the complex stack behind retrieval‑augmented generation, turning what once required extensive engineering...

By Confessions of a Data Guy

Polars’ Streaming Engine Is a Bigger Deal Than People Realize

News•Mar 24, 2026

Polars’ Streaming Engine Is a Bigger Deal Than People Realize

Polars' new streaming engine dramatically improves performance, halving runtimes on moderate datasets and delivering up to four‑times speedups on a 12 GB workload compared with eager execution. The library supports eager, lazy, and streaming modes, with lazy enabling predicate pushdown and...

By Confessions of a Data Guy

DuckDB, AI, and the Future of Data Engineering | with Staff Engineer, Matt Martin

News•Mar 21, 2026

DuckDB, AI, and the Future of Data Engineering | with Staff Engineer, Matt Martin

DuckDB is emerging as a mainstream in‑process analytical engine, allowing SQL queries to run directly inside Python, R, or Julia without a separate server. Staff Engineer Matt Martin highlighted how its columnar storage and vectorized execution deliver warehouse‑level performance on...

By Confessions of a Data Guy

Data Engineering, AI, and Career Growth – Podcast Deep Dive with Yuki Kakegawa

News•Mar 11, 2026

Data Engineering, AI, and Career Growth – Podcast Deep Dive with Yuki Kakegawa

In a recent episode of Data Engineering Central, host Daniel interviews AI specialist Yuki Kakegawa to explore how data engineering intersects with artificial intelligence and what professionals need to thrive. Kakegawa highlights the surge in real‑time data pipelines, the rise...

By Confessions of a Data Guy

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

News•Feb 25, 2026

Spark, Lakehouse & AI: A Deep Conversation with Bart Konieczny

In a recent Data Engineering Central podcast, Bart Konieczny discussed the evolving synergy between Apache Spark, lakehouse architectures, and artificial intelligence. He highlighted Spark's latest performance enhancements, including Catalyst optimizer refinements and native GPU acceleration. Konieczny explained how lakehouses bridge...

By Confessions of a Data Guy

Temporary Tables in Databricks SQL | Do You Actually Need Them?

News•Feb 17, 2026

Temporary Tables in Databricks SQL | Do You Actually Need Them?

The article reviews temporary tables in Databricks SQL, explaining how they store intermediate results for the duration of a session and can be referenced across multiple statements. It compares them to Common Table Expressions, highlighting performance gains when avoiding repeated...

By Confessions of a Data Guy

Migrating to Databricks – A Guide

News•Feb 13, 2026

Migrating to Databricks – A Guide

The guide cautions that moving to Databricks won’t fix weak data fundamentals; organizations must first establish clear dev‑prod separation, version‑controlled code, and cost accountability. It urges teams to define real needs, avoid over‑architecting, and split infrastructure choices from data‑architecture decisions....

By Confessions of a Data Guy

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

News•Feb 11, 2026

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Spark is evolving from low‑level RDD and notebook‑driven workflows to declarative pipelines, branded as Lakeflow on Databricks. The new framework lets engineers define flows, datasets, and pipelines in a configuration‑first manner, while Spark handles execution for both batch and streaming....

By Confessions of a Data Guy

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

News•Feb 11, 2026

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

Robin Moffatt discusses how data engineering has shifted from traditional batch processing to real‑time streaming in a recent podcast interview. He outlines the technical drivers—cloud scalability, event‑driven architectures, and low‑latency analytics—that enable continuous data pipelines. Moffatt also highlights emerging tools...

By Confessions of a Data Guy

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

News•Feb 3, 2026

The Lakehouse Architecture | Multimodal Data, Delta Lake, and Data Engineering with R. Tyler Croy

The article introduces the lakehouse architecture as a unified platform that combines the scalability of data lakes with the performance of data warehouses. It highlights how Delta Lake brings ACID transaction support and schema enforcement to open‑source storage, enabling reliable...

By Confessions of a Data Guy