It's April 1st and I have an announcement: @streamnativeio is a @apachekafka company now. Yes, the Pulsar people. We took Apache Kafka 4.2 and gave it a lakehouse foundation. Topics = Iceberg tables. 10x cheaper. Zero code changes. https://t.co/Waj3eO8BoZ

SQL tip GROUP BY collapses your rows. Sometimes you need the ranking without losing the detail. That's what window functions do. PARTITION BY region restarts the ranking for each region. ORDER BY total_spend DESC puts the highest spender at rank 1. Every row stays intact....
I really suck at software demos, but hopefully that didn't diminish my new Revenue Intelligence feature I demo'd today. I may be able to methodically explain a P&L, but I realized today that I need training on software demos. Can...

Reserve your spot at Friday's Coffee with Digital Trailblazer. Our topic this week: Redefining Data Governance: Is the Data Owner Role Obsolete in the AI Era? https://t.co/i7NcU4uICI #AI https://t.co/9hxa4qwsuS
The IRS is testing Palantir's AI-powered analytics platform to identify "highest-value" audit and investigation targets, documents obtained by Wired reveal. The pilot program aims to cut through decades of fragmented legacy systems to surface taxpayers most likely to be committing...

Great detailed write-up by Mariia explaining how we built topic modelling that turns surprisingly messy support chats into structured & applicable actionable insights A fun reminder that "traditional" ML (e.g. >4 years old) is still very useful https://t.co/vo9KfPIg6q https://t.co/KFMGcHYHxN

SQL tip You're running three separate queries to get this. SELECT SUM(amount) FROM orders WHERE user_type = 'premium'; SELECT COUNT(*) FROM orders WHERE is_first_order = TRUE; SELECT SUM(amount) FROM orders; You can get all three in one. This pattern works across Oracle, SQL Server, PostgreSQL, BigQuery...
Data stewards don't need to be engineers. They need to be domain experts who can speak to data quality. #DataGovernance #DataSteward https://t.co/aQ7n0Kcc79
You have any giant, convoluted code or SQL logic to handle data values that might be similar? @JeffONelson did. But he shows off a new stateless semantic search in @googlecloud BigQuery that might be a lifesaver for small datasets. https://t.co/RU5q8SoJb4
Fastest way to build a business dashboard with AI in 2026: ↓ 1// Take any spreadsheet you already use to track business data. Revenue, leads, whatever you've been tracking manually. 2// Open Claude Code and tell it you want a live dashboard...

RIP BI Dashboards. Tools like Tableau and PowerBI are about to become extinct. This is what's coming (and how to prepare):

Trust In The #Digital Classroom: Why #Data Governance Must Guide #AI In Education by @geoffreyalef1 @Forbes Learn more: https://t.co/BKbfmT1JPq #EduTech #ArtificialIntelligence #DigitalTransformation https://t.co/gLBIx5UmYg

Python tip You've been creating extra columns just to group by month. pd.Grouper does it in one step, inside the groupby. Same result. No extra column. It works for any time frequency -- weekly, quarterly, custom intervals -- without touching your data.

Generative BI is not just an evolution of Business Intelligence. It’s a structural shift in how organizations think, interact, and decide with data. For years, BI promised democratization. In reality, many companies are still stuck between: 🔸 IT bottlenecks 🔸 Low data literacy 🔸 Rigid...

Pandas is not optional anymore. It’s a core skill. Learn it. Use it. Master it.

Python set operators analysts actually use You already know sets remove duplicates. But they also do something more useful. Compare lists without a single loop. | union -- combine two lists, no duplicates. i.e. all customers who bought in January OR February & intersection...
Missed our webinar? See how Crunchbase’s predictive intelligence in @Snowflake helps teams use high-signal data to spot growth, funding, and acquisition signals earlier — and act faster. Get the recording. 🎥: https://t.co/iYm0Ow88gF https://t.co/pJk1MeZSf9

SAP to Acquire Reltio: Make SAP and Non-SAP Data AI-Ready - https://t.co/RBGqnJN8mq >> Congrats. A key move to bolster the data foundation in SAP BDC. MDM and out-of-the-box integration are critical for the se non dee needed in th Agentic...
The observation that data becomes the moat while applications become the commodity feels right. Companies that still think their competitive advantage is their software stack rather than their data architecture may be solving the wrong problem. #AI https://t.co/YVEyjd2R1Y

Accidentally deleted something? Roll back. Time travel in data lake table formats enables versioning of big data. Access any historical version through timestamps or version numbers. https://www.ssp.sh/brain/time-travel
We have entered the INFINITE UI ERA. Statlas MCP + Canon + Prophit Engineer = Endless Customization of Beautiful Personalized Reporting When you organize data effectively and combine it with Ai access you can generate any insight and visualization at warp speed. Problems...
No data quality standards. No QA. No pipeline best practices. That's not a tech problem — that's a governance problem. #DataGovernance #AI #DataStrategy https://t.co/POToYzHvFN

Altimate-code: a new open-source code editor for data engineering based on opencode. Easter comes early for every developer this year. Altimate-code is an OSS agentic code editor that works in the terminal, based on the admired OpenCode AI editor, with...
"Data Migration Is Still Hard: Why the Industry Keeps Underestimating It", by Craig Mullins @craigmullins Every few years the IT industry rediscovers something that experienced practitioners already know: moving data is difficult. https://t.co/XDoOXASJRP
Before investing in smarter automation… Fix the data foundation. Why infrastructure beats point solutions → https://t.co/qWIUPgZhYD @MadaketHealth #PayerIT #HITSM
For the last couple of decades businesses have been torturing their data into shape so it can earn a seat in a data warehouse. Clean it. Structure it. Label it. Only then does it get invited into the warehouse. And...
ETL (Extract, Transform, Load): Transform before loading into the warehouse ELT (Extract, Load, Transform): Load first, transform inside the warehouse The shift to ELT happened because cloud warehouses became cheap and powerful enough to do transformations. Why pay for a separate ETL server...

A month of engineering work compressed into 2 days. That's what we shipped for World Sleep Day. We curated a team of 21 agents covering data engineering, biostatistics, public health, visual design, and even data governance and ethics in the...
Exclusive: Pentagon to adopt Palantir AI as core US military system, memo says. The apotheosis of mil civ fusion... https://t.co/Sp7uxEvsGv
Organizations are giving up control by housing data solely within ERP systems. Regain power by leveraging third-party BI, AI, and workflow tools for in-house data management and functionality. #DataControl #ERP #TechStrategy https://t.co/1ForGtniYv
.@ActianCorp CEO Potter: AI driving a data governance renaissance https://t.co/4qlBrAGgYY Actian CEO Marc Potter said AI is proving to be a wakeup call on data governance as companies realize it's a business imperative. #AIF2026

Free-form text data is everywhere in modern organizations. And it's usually dirty. Tomorrow, 39,000+ professionals will learn a powerful way to clean text data - fuzzy matching. In this age of AI, it's tempting to give free-form text data to an...

I didn’t prioritize SQL early on, I thought it was easy and not that important. I was wrong. It became the language I used the most in data. Practice your queries.
You can now audit each number and flag any inconsistencies. We take data quality very seriously. I don't expect you to have to use this, but anything we can do to build the best data set in equity markets, consider it done....
Data quality influence surged 232% this period. Not AI models. Not agents. Not LLMs. Data. Quality. The most boring discipline in the stack just became the fastest growing. The market is telling you something. Are you listening?
Most companies experimenting with AI are not struggling with models. They’re struggling with: – messy internal data – inconsistent schemas – no documentation – no data ownership You can’t plug OpenAI into chaos and expect magic. Data hygiene is important for AI.
I shared my thoughts with @Infoworld on the new Genie Code from @Databricks https://t.co/54nQ6q4vAQ The goal is to highly automate data science and engineering tasks.

The 10 types of clustering that all data scientists need to know. Let's dive in:
Dagster has a steep learning curve but a payoff. It is Vim for orchestration. The mental model shift: Dagster thinks in assets, not tasks. You define what data should exist, not what steps to run. The engine figures out dependencies and...
The semantic layer isn't new. SAP BusinessObjects had one in 1991. What's new is the need for a universal semantic layer that works across BI tools, notebooks, and applications. When you only had one BI tool, that tool's semantic layer was enough....

The most underrated AI role right now: DataOps Engineer. Not the ML engineer. Not the data scientist. The person who designs automation and testing infrastructure that makes everyone else dramatically more effective. Infrastructure that runs without you. That's the whole job. https://t.co/Cng5iC1BEB

With the latest acquisition of Confluent by IBM, they follow up on the Fivetran, Databricks, and Snowflake stack. Or what do you think? With the latest acquisition in data engineering, it's a race of who gets the most complete data platform...
The Modern Data Stack promised best-of-breed tools that work together seamlessly. The paradox: the more tools you pick, the more integration work you create. One perspective I find helpful: Orchestration as the connective tissue. A good orchestrator doesn't just schedule jobs -...

.@IBM Completes Acquisition of Confluent, Making Real Time Data the Engine of Enterprise AI and Agents https://t.co/QqwqJPCT4P >> Congrats. A key augmentation for the IBM AI capabilities. Good news for customers. #NextGenApps https://t.co/aCKH7wuesW

Snowflake and BigQuery have free datasets you can use to practice SQL with real data. Even better: LLMs are integrated, so you can query in natural language.
As AI moves to core operations, pressure on the data layer also intensifies. I canvassed leaders on the work required to build a well-functioning data environment responsive to today’s AI initiatives. (My latest in Database Trends) https://t.co/X8ar2pKnTZ @BigDataQtrly

We just turned on some new smarts in the @googlecloud BigQuery Studio interface. Now you get context-aware query editing (sees open query tabs), better resource discovery through natural language questions, and smarter troubleshooting. https://t.co/9ekJhzv0Ki https://t.co/SNntL6X6bB

𝐒𝐭𝐞𝐩-𝐛𝐲-𝐒𝐭𝐞𝐩 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 𝐑𝐨𝐚𝐝𝐦𝐚𝐩 (2026 𝐄𝐝𝐢𝐭𝐢𝐨𝐧) Most people try to become Data Engineers by collecting courses. That rarely works. What you actually need is a sequence a progression that builds real capability. Here’s a practical 6-stage roadmap that takes you from foundation → job-ready 👇
I think about data modeling patterns in four main categories: 1. Dimensional modeling (Kimball) - optimized for queries 2. Data Vault - optimized for auditability and change 3. One Big Table - optimized for simplicity 4. Medallion Architecture - optimized for incremental refinement No pattern...

SELECT, FROM, WHERE and JOINs will get you started. Then the work gets complicated and you realise tutorial SQL and production SQL are two very different things. Here's level 2 CTEs — readability I was lost in my own nested subqueries. Couldn't follow...