Computer Architecture Podcast

Ep 23: Cross-Stack Design and Tooling for Large-Scale Distributed AI Systems with Dr. Tushar Krishna, Georgia Tech

Computer Architecture Podcast

•April 6, 2026•1h 8m

Computer Architecture Podcast•Apr 6, 2026

Why It Matters

Understanding cross‑stack design is crucial for building the next generation of AI infrastructure that can meet exploding compute and energy demands while remaining adaptable to rapidly evolving models. This episode offers actionable insights for architects, researchers, and industry leaders seeking to bridge hardware and software to unlock performance gains in today’s AI‑driven world.

Key Takeaways

•AI workloads drive specialized interconnect designs across chips
•Co-design aligns data flow, hardware, and software layers
•Flexible accelerators adapt from CNNs to transformers efficiently
•Distributed training requires collective communication fabrics and scalable networks
•AstroSim, Chakra, Garnet accelerate AI system design exploration

Pulse Analysis

In this episode, Dr. Tushar Krishna traces his evolution from network‑on‑chip research to leading the design of AI accelerators. Early work on multi‑hop NoC architectures at AMD and Intel laid the groundwork for understanding wire delays and traffic patterns. When the ImageNet breakthrough arrived, Krishna leveraged that knowledge to build domain‑specific chips like Iris, optimizing data flow for convolutional networks and demonstrating how precise workload mapping can dictate interconnect design.

Krishna emphasizes cross‑stack co‑design as the engine behind modern AI hardware. By formalizing data‑flow languages such as Maestro, his team bridges silicon‑level routing decisions with software‑level abstractions, ensuring flexibility as models shift from CNNs to large language models. Tools like AstroSim, Chakra, and Garnet enable rapid exploration of memory hierarchies, communication fabrics, and accelerator configurations, allowing designers to balance performance with future‑proofing. This layered approach—burning critical data‑flow patterns into silicon while exposing programmable interfaces—helps avoid over‑specialization and supports evolving workloads.

At scale, distributed training introduces new challenges: collective communication, heterogeneous link technologies, and rack‑level fabrics. Krishna discusses how interconnects evolve from on‑chip links to package‑level NVLink‑style connections and finally to rack‑wide networks, each with distinct constraints. Understanding traffic patterns across these layers guides the selection of scalable fabrics that can handle massive model footprints. For enterprises planning AI infrastructure, these insights highlight the importance of holistic design, from custom NPUs to data‑center networking, ensuring that investments remain performant as AI models continue to grow.

Episode Description

Dr. Tushar Krishna is an Associate Professor in the School of Electrical and Computer Engineering at Georgia Tech, who holds a Ph.D. from MIT. Tushar’s work shapes how the computing community designs modern large-scale distributed AI systems--spanning specialized accelerators, memory hierarchies, and communication fabrics--and driving design-space exploration with pioneering tools like ASTRA-sim, Chakra, and Garnet. A member of the ISCA, MICRO, and HPCA Halls of Fame, his impactful research has garnered over 21,000 citations and the 2025 DAC "Under 40 Innovators Award." He also actively shapes future AI computing standards as the Co-director of Georgia Tech's CRNCH and co-chair of the MLCommons Chakra Working Group.

Show Notes

Comments

Want to join the conversation?

Loading comments...