VuTrinh (Substack)

Publication

0 followers

Weekly curated data engineering resources and deep dives

Podcast•Apr 3, 2026•0 min

Parquet Fundamentals in 3 Mins

The episode explains how Apache Parquet’s hybrid columnar‑row format optimizes storage and query performance for large datasets. It contrasts row‑wise and pure columnar layouts, highlighting the inefficiencies of each, and then describes Parquet’s structure of row groups, column chunks, and pages, along with its self‑describing metadata that enables column pruning and efficient reads. The host also notes Parquet’s origins at Twitter and Cloudera and points listeners to a newsletter for deeper data‑engineering content.

By VuTrinh (Substack)

Podcast•Mar 26, 2026•0 min

(Video) What Is Apache Spark?

The episode traces the evolution from Google’s MapReduce model to Apache Spark, explaining how Spark’s in‑memory processing and the Resilient Distributed Dataset (RDD) abstraction overcome MapReduce’s limitations for iterative and interactive workloads. It breaks down Spark’s core concepts—transformations vs. actions,...

By VuTrinh (Substack)

VuTrinh (Substack)

Parquet Fundamentals in 3 Mins

(Video) What Is Apache Spark?

Technology Pulse

Top Publishers

Top Creators

Top Companies

Top Investors

VuTrinh (Substack)

Parquet Fundamentals in 3 Mins

(Video) What Is Apache Spark?