Multimodal Lakehouse: Evolving Data & Workloads with Chang She
Why It Matters
By collapsing batch and real‑time layers into one multimodal lakehouse, companies can streamline AI workflows, reduce latency, and lower operational overhead, giving them a competitive edge in data‑driven markets.
Key Takeaways
- •Lakehouses now ingest multimodal data beyond tabular, like embeddings and media.
- •Workloads expand to search, model training, and GPU‑accelerated feature engineering.
- •Traditional lakehouses lack efficient online serving; multimodal lakehouse bridges batch and OLTP.
- •Unified format enables simultaneous read/write of diverse data with synchronized metadata.
- •Multimodal capability reduces system complexity and accelerates AI‑centric pipelines.
Summary
In the talk, Chang She outlines a “multimodal lakehouse” that extends traditional data‑lake architectures to handle not only tabular records but also embeddings, images, video and other rich media.
He explains three shifts: first, data formats now span multimodal types, demanding new read/write engines and metadata synchronization. Second, workloads move beyond pure SQL analytics to include vector search, model training, and GPU‑intensive feature engineering. Third, conventional lakehouses excel at batch processing but require separate OLTP systems for real‑time serving; the multimodal lakehouse aims to unify both.
She cites the Last DB format as a concrete innovation that lets the same storage layer support online and offline queries simultaneously, eliminating the need for a separate serving database. This unified approach simplifies pipelines that ingest embeddings, run similarity searches, and feed results into downstream AI models.
For enterprises, the ability to store, process, and serve multimodal data from a single platform can cut infrastructure costs, accelerate AI product cycles, and open new revenue streams that rely on real‑time, media‑rich insights.
Comments
Want to join the conversation?
Loading comments...