
Christophe Pettus: Pg_lake vs Lakebase: Two Very Different Things Called “Postgres + Lakehouse”
Companies Mentioned
Why It Matters
Enterprises must align their data‑platform strategy with the underlying storage semantics; picking the wrong model can undermine ACID guarantees or operational tooling. Understanding the architectural trade‑offs ensures cost‑effective scaling and reliable analytics.
Key Takeaways
- •Snowflake pg_lake keeps native PostgreSQL binary, adds Iceberg tables via extensions.
- •Databricks Lakebase replaces PostgreSQL storage with Neon pageserver and safekeepers.
- •pg_lake offers full OLTP on heap tables, but cross‑table ACID is limited.
- •Lakebase enables serverless scaling and instant branching, but traditional backups don’t apply.
Pulse Analysis
The data‑platform market is increasingly blurring the line between transactional databases and analytical lakehouses. Snowflake and Databricks have each introduced a "PostgreSQL + lakehouse" product, leveraging the familiarity of PostgreSQL while promising seamless lake access. This convergence reflects a broader industry push toward unified workloads, where developers can run OLTP and OLAP queries without moving data between disparate systems. However, the marketing gloss masks fundamentally different engineering choices that affect performance, cost, and operational complexity.
Snowflake’s pg_lake takes a conservative approach: it runs the exact PostgreSQL binary you already know, then layers a set of open‑source extensions that turn Iceberg tables into foreign data wrappers. The core engine still writes WAL locally and manages heap tables with classic MVCC, while analytical scans are off‑loaded to a DuckDB sidecar that reads Parquet files from object storage. This design preserves existing tools, extensions, and backup routines for OLTP workloads, but transactions that span heap and Iceberg tables lose full ACID guarantees, requiring careful testing for mixed workloads.
Databricks’ Lakebase, by contrast, re‑architects the storage stack using Neon’s pageserver and safekeepers. The compute node remains a PostgreSQL binary, but all data lives in object storage, accessed on‑demand via a networked pageserver. This enables serverless scaling, instant branch creation, and pay‑as‑you‑go storage pricing, but it also discards traditional PostgreSQL mechanisms like local pg_wal directories and standard base‑backup workflows. Extensions that depend on low‑level storage behavior may need adjustments, and DBA tooling must adapt to Neon‑specific replication semantics. Choosing between pg_lake and Lakebase therefore depends on whether your priority is preserving existing PostgreSQL operational practices or embracing a fully managed, elastic compute model.
Christophe Pettus: pg_lake vs Lakebase: Two Very Different Things Called “Postgres + Lakehouse”
Comments
Want to join the conversation?
Loading comments...