SSP Data

SSP Data

Creator
0 followers

Data writer publishing deep dives and curating discussions around leading data/Big Data thinkers and resources.

Altimate-Code: Open‑Source Terminal Editor Boosts Data Engineering
SocialMar 25, 2026

Altimate-Code: Open‑Source Terminal Editor Boosts Data Engineering

Altimate-code: a new open-source code editor for data engineering based on opencode. Easter comes early for every developer this year. Altimate-code is an OSS agentic code editor that works in the terminal, based on the admired OpenCode AI editor, with...

By SSP Data
Old Kindle Highlights Resurface to Fuel New Writing
SocialMar 24, 2026

Old Kindle Highlights Resurface to Fuel New Writing

Most highlights die in Kindle. Readwise brings them back and connects them. The power isn't in the capturing. It's in the resurfacing. Sometimes a highlight from two years ago connects perfectly to what you're writing today. https://www.ssp.sh/brain/readwise

By SSP Data
ELT Dominates: Load Fast, Transform In‑warehouse Layers
SocialMar 22, 2026

ELT Dominates: Load Fast, Transform In‑warehouse Layers

ETL (Extract, Transform, Load): Transform before loading into the warehouse ELT (Extract, Load, Transform): Load first, transform inside the warehouse The shift to ELT happened because cloud warehouses became cheap and powerful enough to do transformations. Why pay for a separate ETL server...

By SSP Data
Dagster: Asset‑First Orchestration Over Task‑Centric Pipelines
SocialMar 19, 2026

Dagster: Asset‑First Orchestration Over Task‑Centric Pipelines

Dagster has a steep learning curve but a payoff. It is Vim for orchestration. The mental model shift: Dagster thinks in assets, not tasks. You define what data should exist, not what steps to run. The engine figures out dependencies and...

By SSP Data
Universal Semantic Layer Needed for Multi-Tool Data Access
SocialMar 18, 2026

Universal Semantic Layer Needed for Multi-Tool Data Access

The semantic layer isn't new. SAP BusinessObjects had one in 1991. What's new is the need for a universal semantic layer that works across BI tools, notebooks, and applications. When you only had one BI tool, that tool's semantic layer was enough....

By SSP Data
IBM Joins Data Platform Race with Confluent Acquisition
SocialMar 17, 2026

IBM Joins Data Platform Race with Confluent Acquisition

With the latest acquisition of Confluent by IBM, they follow up on the Fivetran, Databricks, and Snowflake stack. Or what do you think? With the latest acquisition in data engineering, it's a race of who gets the most complete data platform...

By SSP Data
Orchestration Turns Data Stack Flexibility Into Cohesion
SocialMar 17, 2026

Orchestration Turns Data Stack Flexibility Into Cohesion

The Modern Data Stack promised best-of-breed tools that work together seamlessly. The paradox: the more tools you pick, the more integration work you create. One perspective I find helpful: Orchestration as the connective tissue. A good orchestrator doesn't just schedule jobs -...

By SSP Data
Pick Data Modeling Pattern Based on Needs, Not One‑Size
SocialMar 15, 2026

Pick Data Modeling Pattern Based on Needs, Not One‑Size

I think about data modeling patterns in four main categories: 1. Dimensional modeling (Kimball) - optimized for queries 2. Data Vault - optimized for auditability and change 3. One Big Table - optimized for simplicity 4. Medallion Architecture - optimized for incremental refinement No pattern...

By SSP Data
Writing Is Thinking: Build Ideas Through Continuous Notes
SocialMar 14, 2026

Writing Is Thinking: Build Ideas Through Continuous Notes

The key principle from "How to Take Smart Notes": writing is not the outcome of thinking. Writing is the medium in which thinking occurs. When you write about an idea, you're not recording a thought - you're developing it. This is why...

By SSP Data
Effective Data Lineage Connects SQL and Python Pipelines
SocialMar 13, 2026

Effective Data Lineage Connects SQL and Python Pipelines

Data lineage traces your data's journey from source to destination. Where did this number come from? What would break if I changed this table? Who's using this data? Good lineage answers these questions. Bad lineage makes you grep through code. Tools like dbt...

By SSP Data
Dynamic Query Pattern Enables Immediate, Flexible Data Retrieval
SocialMar 12, 2026

Dynamic Query Pattern Enables Immediate, Flexible Data Retrieval

Big news, I added a new design pattern chapter, called «Dynamic Query Design Pattern». This design pattern problem statement goes like this: 1. Provide immediate answers 2. How you model it matters 3. Dumping everything into the lake is painful The core challenge is enabling...

By SSP Data
Validate Data at Source: Shift Left for Quality
SocialMar 12, 2026

Validate Data at Source: Shift Left for Quality

"Shift left" comes from software engineering - finding bugs earlier in the development process. In data, shifting left means: validate data at the source, not after it breaks your dashboard. Instead of hoping bad data doesn't show up in your warehouse, you...

By SSP Data
SQLMesh Adds Semantic SQL, Auto-Dialect Translation for Dbt
SocialMar 11, 2026

SQLMesh Adds Semantic SQL, Auto-Dialect Translation for Dbt

SQLMesh takes dbt's concept and adds semantic understanding of SQL. It parses SQL statements, translates between dialects automatically, and offers compile-time validation. Built by Tobiko Data (now Fivetran). If you're starting fresh, it deserves serious consideration. https://www.ssp.sh/brain/sqlmesh

By SSP Data
AI Erases Craftsmanship and Personal Connection in Creation
SocialMar 10, 2026

AI Erases Craftsmanship and Personal Connection in Creation

Is craftsmanship gone with AI? You can vibe code something on demand, but you have no attachment to what it's built, so how are you gonna sell or rave about something you don't even have any connection to? No hardship,...

By SSP Data
CLI‑first Analytics: DuckDB, MotherDuck, and Rill Empower Agents
SocialMar 9, 2026

CLI‑first Analytics: DuckDB, MotherDuck, and Rill Empower Agents

CLI-first is eating development. Email, calendar—they all have CLIs now. Why not your business metrics? For data/analytics engineers building with agents: DuckDB + MotherDuck + Rill give you an agentfriendly, #localfirst frontend—exact context via SQL and YAML. https://www.rilldata.com/blog/building-an-agent-friendly-local-first-analytics-stack-with-motherduck-and-rill

By SSP Data
Think in Assets, Not Jobs, for Data Reliability
SocialMar 8, 2026

Think in Assets, Not Jobs, for Data Reliability

Dagster's key innovation is software-defined assets. Instead of: "Run this job on a schedule" You declare: "I need this table to exist, here's how to build it" The difference is subtle but profound. Assets have identities, dependencies, and history. Jobs are just tasks. When...

By SSP Data
Template‑Based Pipelines Offer Flexibility, Demand SQL Skill
SocialMar 6, 2026

Template‑Based Pipelines Offer Flexibility, Demand SQL Skill

I spent years working with data warehouse automation tools before the modern data stack existed. The biggest lesson? There are two approaches to generating pipelines: Parametric - you define parameters, the tool generates SQL Template-based - you write SQL templates with variables Most modern...

By SSP Data
DuckDB Lets You Query 10GB Parquet Locally, Ditch Clusters
SocialFeb 27, 2026

DuckDB Lets You Query 10GB Parquet Locally, Ditch Clusters

There's a moment in every data engineer's career when they discover they can query a 10GB Parquet file on their laptop in seconds. That's the DuckDB moment. It changes how you think about what requires a cluster and what doesn't. Spoiler: most...

By SSP Data
Treat Data as Assets, Not Just Jobs.
SocialFeb 26, 2026

Treat Data as Assets, Not Just Jobs.

Data Product: why do we need this data? Data Asset: what is this data and how do we maintain it? I've found the most practical definition of data products comes from Dagster's software-defined assets. Each asset has a clear definition, dependencies, and...

By SSP Data
Managed Iceberg Lets Providers Own the Metadata Control Plane
SocialFeb 25, 2026

Managed Iceberg Lets Providers Own the Metadata Control Plane

Why are hyperscalers racing to offer managed Iceberg? Because whoever controls the catalog controls the ecosystem. If your tables are in a managed Iceberg service, you can query them from any engine - Spark, Trino, DuckDB, whatever. But your metadata stays with...

By SSP Data
Choosing Optional Catalogs for Open Table Formats
SocialFeb 25, 2026

Choosing Optional Catalogs for Open Table Formats

Are we still using table catalogs for open table formats? Haven't heard too much lately. I like OTFs, but making it non-optional to have a catalog isn't great. That's why I like prefer the option to use one without. But if you...

By SSP Data
Exploring Emerging Git‑Style Tools for Data Management
SocialFeb 24, 2026

Exploring Emerging Git‑Style Tools for Data Management

Git for data is still underexplored, and it is an area that is changing so fast. That's why we look at actual tools/features that showcase how to apply a Git-like workflow for data. I compared Git-like tools for data I could...

By SSP Data
Flowrs: New TUI for Managing Airflow Jobs
SocialFeb 24, 2026

Flowrs: New TUI for Managing Airflow Jobs

A TUI for managing Airflow jobs? Something like k9s? Flowrs seems to be just that - haven't tried yet, but looks really cool. Will try next time I have to use Airflow :) https://github.com/jvanbuel/flowrs

By SSP Data
Skip Semantic Layer Early; Use Native Metrics First
SocialFeb 23, 2026

Skip Semantic Layer Early; Use Native Metrics First

Controversial opinion: don't start with a semantic layer. A semantic layer makes sense when: - You have multiple consumers (BI, notebooks, apps) - KPIs are defined inconsistently across teams - You need a universal API for metrics If you're early stage with one BI tool,...

By SSP Data
Rust Powers Python's Data Engineering, Not Replaces It
SocialFeb 22, 2026

Rust Powers Python's Data Engineering, Not Replaces It

Will Rust kill Python in data engineering? No. But it has already consumed much of the JavaScript tooling ecosystem. And it's quietly doing the same in data. The pattern: Python remains the interface, Rust becomes the engine. Polars, DataFusion, DuckDB's internals - all Rust...

By SSP Data
Browse S3 Files Locally in One Fast Command
SocialFeb 20, 2026

Browse S3 Files Locally in One Fast Command

I quickly recorded how easily and conveniently it is to browse S3 files locally with a single command, blazingly fast. Even preview works with DuckDB integration. https://youtu.be/cimUvBd_9Ns

By SSP Data
Use Exponential Backoff with Jitter for Effective Retries
SocialFeb 19, 2026

Use Exponential Backoff with Jitter for Effective Retries

Not all retries are created equal. Immediate retry: usually fails again Exponential backoff: gives systems time to recover Exponential backoff with jitter: prevents thundering herd Most orchestrators have this built in. But you need to understand what's happening or you'll wonder why your retries...

By SSP Data
Semantic Layer: Serve Data Like a Menu, Hide Complexity
SocialFeb 18, 2026

Semantic Layer: Serve Data Like a Menu, Hide Complexity

The semantic layer is like a restaurant menu: you know what you're ordering, but not how it's made. This analogy comes from Maxime Beauchemin and I think it's perfect. Users shouldn't need to understand your star schema to calculate revenue. They should...

By SSP Data
Pivot Tables: Business Data’s Everlasting REPL
SocialFeb 17, 2026

Pivot Tables: Business Data’s Everlasting REPL

Hot take: Pivot tables are the REPL for business data. Just like programmers use REPLs to quickly test code, business users use pivot tables to quickly test hypotheses about their data. Drag a field. See a result. Adjust. Repeat. This feedback loop is...

By SSP Data
Integrate Data Quality Assertions Directly Into Orchestration
SocialFeb 16, 2026

Integrate Data Quality Assertions Directly Into Orchestration

I see data contracts and data quality as overlapping but different: Data contracts: what is the data and how do we enforce it Data products: why do we need this data In practice, I'd argue for asset-based data quality assertions. Every time a...

By SSP Data
Three Red Flags of Non‑Idempotent Data Pipelines
SocialFeb 15, 2026

Three Red Flags of Non‑Idempotent Data Pipelines

From Zach Wilson, three signs your pipeline isn't idempotent: 1. It uses INSERT INTO instead of INSERT OVERWRITE or MERGE 2. Date filters have "date > start" but no "date < end" - this causes exponential backfill costs 3. Source tables are always...

By SSP Data
Data Engineering: Experience Beats Tutorials Through Pattern Recognition
SocialFeb 13, 2026

Data Engineering: Experience Beats Tutorials Through Pattern Recognition

After years in data engineering, I've realized the job is mostly pattern recognition. You see a problem. You recognize it as a variant of a problem you've solved before. You apply a known solution with modifications. This is why experience matters more...

By SSP Data
StarRocks Delivers DWH‑Level Joins on Lakehouse Natively
SocialFeb 9, 2026

StarRocks Delivers DWH‑Level Joins on Lakehouse Natively

Today, I dig into the details of StarRocks and how it is gaining traction in the real-time database world. DWH-like joins and fast retrieval from a #Lakehouse-native data architecture, without additional data engineering work to persist and ingest data. https://www.ssp.sh/blog/starrocks-lakehouse-native-joins/

By SSP Data
Modern Tools Reshape Kimball’s Data Modeling Techniques
SocialFeb 9, 2026

Modern Tools Reshape Kimball’s Data Modeling Techniques

What's changed since Kimball wrote The Data Warehouse Toolkit: 1. Surrogate keys are less necessary with better databases 2. Denormalization for performance matters less with modern engines 3. Snapshotting dimensions beats complex SCD2 logic 4. Collaboration requirements mean looser conformance Kimball's principles still matter. But...

By SSP Data
SSP Data - Page 2 | Pulse