Nvidia Updates Data Center Roadmap with Rosa CPU and Stacked Feynman GPUs — Optical NVLink, Groq LPUs with NVFP4, and NVLink Also on Deck

•March 17, 2026

Tom's Hardware•Mar 17, 2026

Companies Mentioned

NVIDIA

NVDA

Groq

Why It Matters

The roadmap accelerates Nvidia’s AI compute density and reduces CPU‑GPU integration time, reshaping competitive dynamics in hyperscale data centers.

Key Takeaways

•Rosa CPU powers Feynman GPUs, halves development cycle.
•Feynman GPUs use die‑stacked HBM, >1 TB memory.
•NVLink with co‑packaged optics enables optical scaling.
•Groq LP40 LPU adds NVFP4 support to Nvidia stack.
•Kyber racks scale to 1,152 GPUs, quadruple performance.

Pulse Analysis

Nvidia’s refreshed roadmap reflects a broader industry shift toward tighter CPU‑GPU integration and higher compute density. By introducing the in‑house Rosa processor, Nvidia aims to cut the traditional four‑year CPU cadence in half, delivering a single‑thread engine optimized for AI workloads. Coupled with the Feynman GPU’s die‑stacked high‑bandwidth memory, the platform targets memory capacities beyond 1 TB per package, a leap that could alleviate data‑movement bottlenecks in large language model training.

The partnership with Groq’s low‑latency LPUs adds a new layer of specialization. LP40, equipped with the NVFP4 data format, will sit alongside Nvidia GPUs, offering inference acceleration that complements the massive parallelism of the Feynman architecture. Meanwhile, NVLink’s transition to co‑packaged optical interconnects promises lower latency and higher bandwidth across racks, making configurations like the 1,152‑GPU Kyber chassis economically viable. This optical scaling reduces power and cabling complexity, a critical factor for hyperscale operators seeking to maximize floor‑space efficiency.

For the data‑center market, these innovations could compress the performance‑per‑dollar curve and pressure rivals to accelerate their own integration strategies. The combination of a custom CPU, stacked GPUs, and optical fabric positions Nvidia to dominate next‑generation AI infrastructure, especially as enterprises migrate from on‑premise clusters to cloud‑native, high‑throughput workloads. Early adopters stand to gain significant gains in training speed and inference latency, while the broader ecosystem may see a ripple effect in standards for memory, interconnect, and accelerator co‑design.