How SONiC Powers the World's Largest AI Infrastructure

Open Compute Project
Open Compute ProjectMay 19, 2026

Why It Matters

By turning the network into a performance‑critical component, SONiC and Microsoft’s innovations enable AI training at massive scale while preserving latency and efficiency, setting a new standard for hyperscale data‑center networking.

Key Takeaways

  • AI traffic creates synchronous elephant flows causing micro‑bursts
  • Traditional congestion controls like PFC/ECN fail for AI workloads
  • SONiC enables sub‑millisecond telemetry and fast feedback loops
  • Microsoft’s Fairwater uses BGP, SRv6, packet trimming for scale
  • High‑frequency streaming telemetry on Tomahawk ASICs drives real‑time visibility

Summary

The presentation introduced Microsoft’s Fairwater AI data center, the world’s largest AI infrastructure built on Broadcom’s Tomahawk 5 ASICs and the open‑source SONiC network operating system. It explained why AI traffic differs fundamentally from traditional workloads: tens of thousands of GPUs communicate in lockstep, generating low‑entropy, elephant‑flow bursts that saturate single paths and demand micro‑second‑scale congestion handling. Key technical insights included the need to abandon conventional congestion mechanisms such as PFC and ECN, which exacerbate head‑of‑line blocking under bursty AI traffic. Instead, Microsoft leveraged SONiC’s extensible stack to implement BGP at 100 Gb granularity, SRv6 source routing for deterministic traffic spreading, and packet‑trimming to pre‑emptively signal drops and trigger rapid retransmission. These innovations enable sub‑0.1‑second routing convergence and lossless communication across a topology that can host up to 500 k GPUs. Notable examples highlighted the multi‑plane topology where each Tomahawk 5 switch fans out to 512 neighbors, supporting 512 BGP sessions per switch and SRv6‑encoded hop‑by‑hop paths. Packet trimming compresses overflow packets, allowing line‑rate egress from up to 18 ports onto a single port without loss. High‑frequency streaming telemetry pushes IPFIX counters from 512 ports at millisecond intervals, giving operators real‑time visibility into congestion events. The implications are profound: AI workloads now treat the network as a compute substrate, requiring real‑time telemetry, deterministic routing, and lossless mechanisms. SONiC’s open architecture proved capable of scaling to unprecedented GPU densities, offering a blueprint for other hyperscale operators seeking to deploy next‑generation AI training clusters.

Original Description

Presenter(s):
Guohan Lu, Principle Software Engineer, Microsoft
Mehak Mahajan, Senior Director- Engineering, Broadcom
The rapid rise of AI has driven backend clusters from tens to hundreds of thousands of GPUs- creating unique needs versus traditional data centers: ultra-low latency- high throughput- and proactive fault detection. This demands advanced traffic engineering and telemetry. This talk presents Microsoft's next-generation AI backend network architecture- detailing deployed features such as Segment Routing over IPv6 (SRv6)- High-Frequency Streaming Telemetry (HFST)- and trimming--their implementations and motivations. We'll also assess SAI and SONiC readiness for hyper-scale AI backend network deployment.

Comments

Want to join the conversation?

Loading comments...