Big Data Videos

All News Deals Social Blogs Videos Podcasts Digests

Big Data CTO Pulse AI

Multimodal Lakehouse: Evolving Data & Workloads with Chang She

•June 17, 2026

O’Reilly Media

O’Reilly Media•Jun 17, 2026

Why It Matters

By collapsing batch and real‑time layers into one multimodal lakehouse, companies can streamline AI workflows, reduce latency, and lower operational overhead, giving them a competitive edge in data‑driven markets.

Key Takeaways

•Lakehouses now ingest multimodal data beyond tabular, like embeddings and media.
•Workloads expand to search, model training, and GPU‑accelerated feature engineering.
•Traditional lakehouses lack efficient online serving; multimodal lakehouse bridges batch and OLTP.
•Unified format enables simultaneous read/write of diverse data with synchronized metadata.
•Multimodal capability reduces system complexity and accelerates AI‑centric pipelines.

Summary

In the talk, Chang She outlines a “multimodal lakehouse” that extends traditional data‑lake architectures to handle not only tabular records but also embeddings, images, video and other rich media.

He explains three shifts: first, data formats now span multimodal types, demanding new read/write engines and metadata synchronization. Second, workloads move beyond pure SQL analytics to include vector search, model training, and GPU‑intensive feature engineering. Third, conventional lakehouses excel at batch processing but require separate OLTP systems for real‑time serving; the multimodal lakehouse aims to unify both.

She cites the Last DB format as a concrete innovation that lets the same storage layer support online and offline queries simultaneously, eliminating the need for a separate serving database. This unified approach simplifies pipelines that ingest embeddings, run similarity searches, and feed results into downstream AI models.

For enterprises, the ability to store, process, and serve multimodal data from a single platform can cut infrastructure costs, accelerate AI product cycles, and open new revenue streams that rely on real‑time, media‑rich insights.

Original Description

The data lakehouse is a fairly standard concept at this point. LanceDB cofounder and CEO Chang She explains why AI workloads mean those lakehouses now need to be multimodal. “Traditional lakehouses tend to be good only at batch offline processing,” he notes. For serving or online processing, you’ll need a more capable system. #data #datalakehouse #multimodaldata #ai #shorts

Follow O'Reilly on:

LinkedIn: https://www.linkedin.com/company/oreilly/

Facebook: http://facebook.com/OReilly

Instagram: https://www.instagram.com/oreillymedia

BlueSky: https://bsky.app/profile/oreilly.bsky.social

Comments

Want to join the conversation?

Loading comments...