SSP Data

Creator

0 followers

Data writer publishing deep dives and curating discussions around leading data/Big Data thinkers and resources.

Rust Powers Python's Data Engineering, Not Replaces It

Will Rust kill Python in data engineering? No. But it has already consumed much of the JavaScript tooling ecosystem. And it's quietly doing the same in data. The pattern: Python remains the interface, Rust becomes the engine. Polars, DataFusion, DuckDB's internals - all Rust under the hood, all Python on top. You don't need to learn Rust. But you should know what's happening beneath your Python code. https://ssp.sh/blog/rust-for-data-engineering/

By SSP Data

Social•Feb 20, 2026

Browse S3 Files Locally in One Fast Command

I quickly recorded how easily and conveniently it is to browse S3 files locally with a single command, blazingly fast. Even preview works with DuckDB integration. https://youtu.be/cimUvBd_9Ns

By SSP Data

Social•Feb 19, 2026

Use Exponential Backoff with Jitter for Effective Retries

Not all retries are created equal. Immediate retry: usually fails again Exponential backoff: gives systems time to recover Exponential backoff with jitter: prevents thundering herd Most orchestrators have this built in. But you need to understand what's happening or you'll wonder why your retries...

By SSP Data

Social•Feb 18, 2026

Semantic Layer: Serve Data Like a Menu, Hide Complexity

The semantic layer is like a restaurant menu: you know what you're ordering, but not how it's made. This analogy comes from Maxime Beauchemin and I think it's perfect. Users shouldn't need to understand your star schema to calculate revenue. They should...

By SSP Data

Social•Feb 17, 2026

Pivot Tables: Business Data’s Everlasting REPL

Hot take: Pivot tables are the REPL for business data. Just like programmers use REPLs to quickly test code, business users use pivot tables to quickly test hypotheses about their data. Drag a field. See a result. Adjust. Repeat. This feedback loop is...

By SSP Data

Social•Feb 16, 2026

Integrate Data Quality Assertions Directly Into Orchestration

I see data contracts and data quality as overlapping but different: Data contracts: what is the data and how do we enforce it Data products: why do we need this data In practice, I'd argue for asset-based data quality assertions. Every time a...

By SSP Data

Social•Feb 15, 2026

Three Red Flags of Non‑Idempotent Data Pipelines

From Zach Wilson, three signs your pipeline isn't idempotent: 1. It uses INSERT INTO instead of INSERT OVERWRITE or MERGE 2. Date filters have "date > start" but no "date < end" - this causes exponential backfill costs 3. Source tables are always...

By SSP Data

Social•Feb 13, 2026

Data Engineering: Experience Beats Tutorials Through Pattern Recognition

After years in data engineering, I've realized the job is mostly pattern recognition. You see a problem. You recognize it as a variant of a problem you've solved before. You apply a known solution with modifications. This is why experience matters more...

By SSP Data

Social•Feb 9, 2026

StarRocks Delivers DWH‑Level Joins on Lakehouse Natively

Today, I dig into the details of StarRocks and how it is gaining traction in the real-time database world. DWH-like joins and fast retrieval from a #Lakehouse-native data architecture, without additional data engineering work to persist and ingest data. https://www.ssp.sh/blog/starrocks-lakehouse-native-joins/

By SSP Data

Social•Feb 9, 2026

Modern Tools Reshape Kimball’s Data Modeling Techniques

What's changed since Kimball wrote The Data Warehouse Toolkit: 1. Surrogate keys are less necessary with better databases 2. Denormalization for performance matters less with modern engines 3. Snapshotting dimensions beats complex SCD2 logic 4. Collaboration requirements mean looser conformance Kimball's principles still matter. But...

By SSP Data

Technology Pulse

SSP Data

Recent Posts

Rust Powers Python's Data Engineering, Not Replaces It

Browse S3 Files Locally in One Fast Command

Use Exponential Backoff with Jitter for Effective Retries

Semantic Layer: Serve Data Like a Menu, Hide Complexity

Pivot Tables: Business Data’s Everlasting REPL

Integrate Data Quality Assertions Directly Into Orchestration

Three Red Flags of Non‑Idempotent Data Pipelines

Data Engineering: Experience Beats Tutorials Through Pattern Recognition

StarRocks Delivers DWH‑Level Joins on Lakehouse Natively

Modern Tools Reshape Kimball’s Data Modeling Techniques

Technology Pulse

SSP Data

Recent Posts

Rust Powers Python's Data Engineering, Not Replaces It

Browse S3 Files Locally in One Fast Command

Use Exponential Backoff with Jitter for Effective Retries

Semantic Layer: Serve Data Like a Menu, Hide Complexity

Pivot Tables: Business Data’s Everlasting REPL

Integrate Data Quality Assertions Directly Into Orchestration

Three Red Flags of Non‑Idempotent Data Pipelines

Data Engineering: Experience Beats Tutorials Through Pattern Recognition

StarRocks Delivers DWH‑Level Joins on Lakehouse Natively

Modern Tools Reshape Kimball’s Data Modeling Techniques