OpenAI Built a Networking Protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to Fix AI Supercomputer Bottlenecks

OpenAI Built a Networking Protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to Fix AI Supercomputer Bottlenecks

THE DECODER
THE DECODERMay 6, 2026

Why It Matters

MRC removes a critical bottleneck in AI training infrastructure, boosting reliability and lowering operational expenses for large‑scale model development. Its industry‑wide backing signals a shift toward more resilient, cost‑effective supercomputing networks.

Key Takeaways

  • MRC spreads packets across hundreds of paths, cutting congestion
  • Enables sub‑second rerouting around network failures, improving uptime
  • Connects 100,000+ GPUs with two Ethernet tiers, lowering cost
  • Deployed on OpenAI’s GB200 and Microsoft Fairwater supercomputers
  • Opened via Open Compute Project, inviting broader industry adoption

Pulse Analysis

AI model training now hinges on the ability to move petabytes of data between thousands of GPUs without delay. Traditional 800 Gb/s fabrics rely on single‑path routing and hierarchical switch tiers, which can stall for seconds when a link or switch fails, inflating training time and electricity bills. The industry’s rapid scaling—exemplified by OpenAI’s GB200 clusters—has exposed these latency and reliability gaps, prompting a search for a more flexible networking fabric.

Multipath Reliable Connection tackles the problem by fragmenting each data transfer across hundreds of parallel Ethernet paths, effectively creating a mesh that can absorb failures instantly. When a link drops, MRC detects the anomaly and redirects traffic in microseconds, keeping training jobs uninterrupted. By consolidating what would normally require three or four switch tiers into just two, the protocol slashes power draw and hardware spend, while still supporting the massive bandwidth demands of frontier models like ChatGPT and Codex.

The collaborative launch with AMD, Broadcom, Intel, Microsoft and NVIDIA underscores a broader industry consensus that network resilience is as vital as compute power. Publishing the specification through the Open Compute Project invites other cloud providers and data‑center operators to adopt the standard, potentially reshaping the economics of AI supercomputing. As models grow larger and training cycles lengthen, protocols like MRC could become a prerequisite for cost‑effective, uninterrupted AI development, giving early adopters a competitive edge in the race for next‑generation intelligence.

OpenAI built a networking protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to fix AI supercomputer bottlenecks

Comments

Want to join the conversation?

Loading comments...