![How Capital One Delivers Multi-Agent Systems [Rashmi Shetty] - 765](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://i.ytimg.com/vi/SUE1L3XugiQ/maxresdefault.jpg)
How Capital One Delivers Multi-Agent Systems [Rashmi Shetty] - 765
The TWIML AI podcast episode features Rashmi Shetty, senior director of Capital One’s enterprise generative AI platform, explaining the bank’s transition from traditional machine‑learning pipelines to large‑language‑model (LLM) driven systems that can actually execute actions. She outlines how the organization moved from simple response generation to a multi‑agent framework that decomposes large, goal‑oriented problems into discrete tasks handled by specialized agents. Key insights include the rationale for multi‑agent architectures: complex use cases require step‑wise orchestration, with each agent responsible for a narrow function such as intent disambiguation, planning, risk validation, or user‑facing response formatting. The flagship "Chat Concierge" pilot demonstrates this approach in the auto‑dealer space, matching customers to vehicles and automating follow‑up actions like test‑drive scheduling. Shetty emphasizes that regulatory compliance is baked into both the platform and individual agents via policy guards, risk‑office oversight, and evaluation gates. "Policy‑bound agentic operations" ensure that agents cannot violate banking regulations or expose the bank to cyber risk, while still delivering personalized experiences. The broader implication is a new developer experience: Capital One’s internal platform supplies SDKs, memory services, data‑lineage tools, and latency controls, enabling rapid, secure deployment of agentic solutions at scale. This positions the bank to leverage its data advantage while meeting stringent financial‑industry standards.
![The Race to Production-Grade Diffusion LLMs [Stefano Ermon] - 764](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://i.ytimg.com/vi/UDNDOf5hT-A/maxresdefault.jpg)
The Race to Production-Grade Diffusion LLMs [Stefano Ermon] - 764
Stanford professor Stefano Ermon and Inception Labs unveiled Mercury 2, a commercial‑scale diffusion language model that generates multiple tokens simultaneously. By adapting diffusion techniques—originally designed for images—to discrete text and code, Mercury 2 achieves inference speeds 5‑10× faster than comparable frontier models....
![AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More [Sebastian Raschka] - 762](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://i.ytimg.com/vi/f9jwTSfIPuM/maxresdefault.jpg)
AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More [Sebastian Raschka] - 762
The Twimmel AI podcast episode spotlights the 2026 AI landscape, emphasizing that post‑training innovations—especially reasoning‑focused fine‑tuning—are now the primary engine of LLM improvement, while architectural changes remain modest. It also highlights the growing emphasis on tool‑use, where models are trained...