SSP Data
Data writer publishing deep dives and curating discussions around leading data/Big Data thinkers and resources.

Schema Evolution: Add Columns Without Breaking Downstream Consumers
Adding a column seems trivial. Until you realize 47 downstream consumers break. Schema evolution is a pivotal feature of data lake table formats. It enables seamless addition of new columns without disrupting existing structures. https://www.ssp.sh/brain/schema-evolution

DevOps Is Becoming Data Engineering’s New Data Science Role
Is DevOps the new data engineering of data science? As in the old days, when you spent 80% of your time on data engineering instead of data science. https://www.ssp.sh/brain/the-state-of-devops-in-data-engineering
BI Isn't Dead: Dashboards Still Power Enterprise Insights
We've heard it all. BI and dashboards are dead. But every time, only to rediscover its power and resurrection whenever we need grounded data analysis in any enterprise and startup space. https://www.rilldata.com/blog/ai-reveals-why-bi-still-matters-hint-its-not-dashboards

Apache Arrow Enables Zero‑Copy Cross‑Language Data Sharing
Zero-copy data sharing between Python, Java, C++ without serialization overhead. That's Apache Arrow. Arrow is not a file format. It's an in-memory columnar format. https://www.ssp.sh/brain/apache-arrow
Embrace Slow Tech for Deeper Focus and Greater Productivity
«Slow Tech» is slow in the moment, but increases productivity in the long term. It does so by slowly processing information, which makes you focus and creates more insightful outcomes. It lets you do more meaningful and deep work.

Turn Posts Into Evolving Digital Gardens, Not Polished Artifacts
Stop publishing polished posts. Publish living documents that evolve. A digital garden is a public second brain. You don't start from a blank page. You build on what exists. Compounding note-taking amplifies the value of each note. https://www.ssp.sh/brain/digital-garden

New Data Surge Makes Internal Search Essential
90% of the world's data was generated in just the past two years. Discoverability is critical. A data catalog is Google Search for your internal metadata. https://www.ssp.sh/brain/data-catalog
Avoid AI’s ‘Vampiric Effect’—Prioritize Sleep Over Hype
AI is addictive; Steve Xegge calls it the «Vampiric Effect», as you won't go to sleep and keep trying to instruct your agents all night long. DHH said tough, there's no limited sales going on, AI will be around in...

Open‑Source Data Stack Cuts Costs for Mid‑Scale Companies
Full open-source stack for running at low cost for mid-scale companies. Such as Dagster + DuckDB + dbt + Airbyte. https://www.ssp.sh/brain/open-data-stack

Digital Typewriter: Constraints Turn Into Productivity
The typewriter is back. Digital. Because constraints create focus. A device that only writes. That's not a limitation. That's productivity. https://www.ssp.sh/brain/distract-free-typewriter

Kimball’s Dimensional Modeling Still Guides Business Process Design
30 years later, Kimball's facts and dimensions and conformed dimensions transcend tooling. Dimensional modeling emphasizes identifying key business processes first, then progressively adding more. https://www.ssp.sh/brain/dimensional-modeling
AI Needs Human Judgment to Finish Quality Software
Models are not good at pushing back, saying no, saying: > Have you actually thought this through? There's not a lot of that going on. > Agents don't finish beautiful, ergonomic, desirable software. They just don't. That human finishing at...

30 Years Later, Inmon’s Data Warehouse Definition Still Holds
30+ years of proven patterns. Both still relevant. Inmon (1990): "A subject-oriented, integrated, time-variant, non-volatile collection for management decision-making." https://www.ssp.sh/brain/data-warehouse
Design Your Environment, Not Willpower, for Deep Work
How I get into deep work: 1. Journal before bed - write the 1-2 things for tomorrow 2. Go to bed early 3. Get up before distractions begin 4. Don't check the phone first thing 5. Change environments when stuck The key insight: deep work isn't...

Make AI Outputs Deterministic: Mark’s Practical Data Workflow
Most people vibe-code with AI agents and wonder why the output is unreliable. Mark spent weeks figuring out how to make it more deterministic. The AI space is moving fast, and everyone is figuring it out as they go. In this...