SSP Data
Data writer publishing deep dives and curating discussions around leading data/Big Data thinkers and resources.

Centralize Metrics with a DRY Analytics API
Instead of duplicating measures in each BI tool, store them centrally in an Analytics API. This is the DRY principle applied to metrics. One metric definition, accessible via GraphQL, SQL, or REST. https://www.ssp.sh/brain/analytics-api

Bitemporal Modeling: Managing Data Across Valid and Transaction Times
Valid time vs transaction time. When you need both. Bitemporal modeling handles historical data along two distinct timelines. https://www.ssp.sh/brain/bitemporal-modeling
Backfilling: The Mark of a Great Data Engineer
Backfilling is where you see the difference between a data engineer and a great data engineer. A backfill means taking a data asset normally updated incrementally and updating historical parts of it. https://www.ssp.sh/brain/backfill

External Tables Persist: Legacy Need Meets Modern Data Access
Have you ever thought why Databricks, BigQuery, and others are still adding features such as External Tables? I've used them when starting my career in 2003, but why are they still used today? And what are the modern versions of...

From Imperative to Declarative: Rethink Data System Design
Not just syntax. A fundamental shift in how you think about data systems. Imperative: dictate exact steps. Declarative: describe what you want, system figures out how. https://www.ssp.sh/brain/declarative-vs-imperative

AI Code Saves Time, But Maintenance Remains a Burden
Everyone wants to use AI to automate data work, but nobody wants to review, let alone maintain it. AI-generated code is very similar to written text, except that code is more functional. I did a little self-experiment to test the...
Embed Governance in Tools, Not Just Policies
Most companies think data governance is about policies and committees. The ones that get it right embed governance into their tools. https://www.ssp.sh/brain/data-governance
Data Lakes Gain Full ACID Guarantees Like Traditional Databases
Normally ACID means a database. But now data lakes like Delta Lake added these features too. Atomicity, Consistency, Isolation, Durability. Simple files on S3 now have the same guarantees as Postgres. https://www.ssp.sh/brain/acid-transactions

Schema Evolution: Add Columns Without Breaking Downstream Consumers
Adding a column seems trivial. Until you realize 47 downstream consumers break. Schema evolution is a pivotal feature of data lake table formats. It enables seamless addition of new columns without disrupting existing structures. https://www.ssp.sh/brain/schema-evolution

DevOps Is Becoming Data Engineering’s New Data Science Role
Is DevOps the new data engineering of data science? As in the old days, when you spent 80% of your time on data engineering instead of data science. https://www.ssp.sh/brain/the-state-of-devops-in-data-engineering
BI Isn't Dead: Dashboards Still Power Enterprise Insights
We've heard it all. BI and dashboards are dead. But every time, only to rediscover its power and resurrection whenever we need grounded data analysis in any enterprise and startup space. https://www.rilldata.com/blog/ai-reveals-why-bi-still-matters-hint-its-not-dashboards

Apache Arrow Enables Zero‑Copy Cross‑Language Data Sharing
Zero-copy data sharing between Python, Java, C++ without serialization overhead. That's Apache Arrow. Arrow is not a file format. It's an in-memory columnar format. https://www.ssp.sh/brain/apache-arrow
Embrace Slow Tech for Deeper Focus and Greater Productivity
«Slow Tech» is slow in the moment, but increases productivity in the long term. It does so by slowly processing information, which makes you focus and creates more insightful outcomes. It lets you do more meaningful and deep work.

Turn Posts Into Evolving Digital Gardens, Not Polished Artifacts
Stop publishing polished posts. Publish living documents that evolve. A digital garden is a public second brain. You don't start from a blank page. You build on what exists. Compounding note-taking amplifies the value of each note. https://www.ssp.sh/brain/digital-garden

New Data Surge Makes Internal Search Essential
90% of the world's data was generated in just the past two years. Discoverability is critical. A data catalog is Google Search for your internal metadata. https://www.ssp.sh/brain/data-catalog