
The TWIML AI Podcast
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
Why It Matters
Relational foundation models promise to unlock the vast, underutilized relational data that every enterprise holds, turning it into actionable insights with far less engineering effort. By extending these techniques to biomedical data, the same technology can accelerate drug discovery and personalized medicine, making the episode relevant for both business leaders and researchers seeking next‑generation AI tools.
Key Takeaways
- •Relational foundation model predicts without training on any database.
- •Graph neural networks replace manual feature engineering for tabular data.
- •Single‑cell RNA‑seq enables digital twin blood analysis for disease.
- •Fraud detection and recommender systems benefit from multi‑table graph learning.
- •AI for science builds cell models from protein embeddings.
Pulse Analysis
In this episode Jure Leskovec unveils a relational foundation model that can make accurate predictions on any enterprise database without any task‑specific training. By treating tables as nodes and foreign‑key links as edges, the model applies graph neural networks directly to raw relational data, eliminating the costly feature‑engineering pipelines that have dominated tabular machine learning for decades. This shift mirrors the breakthroughs seen in computer vision and natural language processing, where models now learn straight from raw pixels or tokens, and promises enterprises faster deployment and higher predictive performance across diverse use cases.
Leskovec also dives into his AI for science agenda, describing the AI Virtual Cell project that builds multi‑scale representations from proteins up to whole patients. Using single‑cell RNA‑seq—a 20,000‑dimensional snapshot of each cell’s protein abundance—combined with protein language models like ESM and structural tools such as AlphaFold, the team creates a digital twin from a single drop of blood. The approach is fully self‑supervised, letting biological hierarchies emerge from data rather than being hand‑coded, accelerating drug discovery, disease trajectory mapping, and personalized therapy design.
Practical applications of relational deep learning are already evident. Fraud detection, anti‑money‑laundering, churn prediction, and recommender systems all benefit from the model’s ability to ingest heterogeneous, multi‑table schemas and uncover hidden patterns that traditional XGBoost or linear models miss. By learning directly on the graph of entities—customers, products, transactions—these systems achieve double‑digit accuracy gains while reducing engineering overhead. As enterprises grapple with ever‑growing relational datasets, Leskovec’s work signals a new era where AI can unlock value from the full relational fabric without the bottlenecks of manual preprocessing.
Episode Description
In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and structure models like AlphaFold—without hand-encoding biology. Jure then dives into relational deep learning, reframing enterprise databases as graphs and training neural networks directly on raw multi-table data. He explains Kumo’s Relational Foundation Model (RFM2), which performs in-context learning over subgraphs to make accurate predictions on new databases and tasks with no training, and how this approach benchmarks against RelBench and other multi-table datasets. We also discuss real-world deployments at companies like Reddit, DoorDash, and Coinbase, explainability via attention over tables and columns, integration with agentic systems, deployment options, and practical limitations.
The complete show notes for this episode can be found at https://twimlai.com/go/768.
Comments
Want to join the conversation?
Loading comments...