FuriosaAI Partners with Broadcom to Build Next-Generation Inference Platform for the Agentic Era

FuriosaAI Partners with Broadcom to Build Next-Generation Inference Platform for the Agentic Era

StorageNewsletter
StorageNewsletterJun 8, 2026

Key Takeaways

  • FuriosaAI and Broadcom co-develop third-gen AI inference accelerator
  • Chip uses 2 nm compute die, HBM4/4E, and dedicated I/O die
  • Platform targets agentic AI workloads with high bandwidth, low latency interconnect
  • Software stack auto-compiles PyTorch to silicon, boosting developer velocity

Pulse Analysis

The AI hardware landscape is shifting from raw compute horsepower to efficient data reuse and cross‑server communication. As generative and agentic models grow in size, inference workloads demand not just faster cores but also ultra‑low latency interconnects that can move tokens across racks without bottlenecks. Traditional GPU‑centric designs struggle with these requirements, prompting silicon innovators to rethink architecture at the system level. FuriosaAI’s Tensor Contraction Processor, already proven in production by Samsung SDS and LG AI Research, provides a strong foundation for this new approach.

In the partnership, Broadcom brings its market‑leading XPU IP, Ethernet scaling, and advanced packaging expertise to Furiosa’s next‑gen chip. The design features a 2 nm compute die paired with a separate I/O die, enabling high‑density HBM4/4E memory and a fabric that can link hundreds of chips within a rack. By offloading data movement to a dedicated networking layer, the platform promises industry‑leading performance‑per‑watt and token density, outpacing state‑of‑the‑art GPUs on frontier LLM and agentic tasks. This chiplet‑centric strategy also simplifies scaling, allowing hyperscale operators to expand capacity without redesigning the entire board.

Beyond silicon, Furiosa’s software stack differentiates the offering by translating high‑level PyTorch code directly to hardware via a general compiler, while a Virtual ISA gives developers granular control without the complexity of GPU kernels. This accelerates model rollout and reduces engineering overhead, critical for enterprises racing to monetize new AI services. With sampling expected in early 2028, the solution is positioned to become a cornerstone of next‑decade data‑center AI deployments, reshaping the competitive dynamics among chipmakers and cloud providers.

FuriosaAI Partners with Broadcom to Build Next-generation Inference Platform for the Agentic Era

Comments

Want to join the conversation?