Intel Delivers Open, Scalable AI Performance in MLPerf Inference v6.0

•April 1, 2026

HPCwire•Apr 1, 2026

Key Takeaways

•Four‑GPU Arc Pro B70 system supports 120B‑parameter models
•B70 delivers up to 1.8× inference speed over B60
•Intel Xeon 6 CPUs achieve 1.9× generational performance gain
•Multi‑GPU scaling yields up to 1.18× improvement versus v5.1
•Containerized stack provides ECC, SR‑IOV, telemetry, remote updates

Summary

Intel’s latest MLPerf Inference v6.0 results highlight its Xeon 6 CPUs paired with Arc Pro B70/B65 GPUs delivering open, scalable AI performance across workstations, data‑center, and edge workloads. A four‑GPU B70 configuration offers 128 GB of VRAM and can run 120‑billion‑parameter models, achieving up to 1.8× higher inference speed than the prior B60 and 1.18× gains over the v5.1 benchmark. Intel emphasizes a containerized software stack that scales from single‑node to multi‑GPU deployments while providing enterprise features such as ECC, SR‑IOV, and remote firmware updates. The company also remains the sole server‑CPU vendor submitting stand‑alone CPU results, underscoring Xeon 6’s 1.9× generational performance uplift.

Pulse Analysis

Intel’s MLPerf Inference v6.0 showcase signals a strategic shift in AI hardware, marrying high‑end GPU acceleration with its own Xeon 6 CPUs. By delivering 128 GB of VRAM across four Arc Pro B70 GPUs, the platform can handle 120‑billion‑parameter large language models—an ability traditionally reserved for premium NVIDIA or AMD solutions. The reported 1.8× performance edge over the B60 and a 1.18× uplift from the previous benchmark version illustrate how Intel’s hardware‑software co‑design is narrowing the gap in raw inference throughput while offering a more cost‑effective price point.

Beyond raw speed, Intel’s containerized stack introduces enterprise‑grade features that simplify large‑scale deployments. Built‑in ECC memory protection, SR‑IOV virtualization, telemetry, and remote firmware updates reduce operational overhead and align with data‑center reliability standards. Multi‑GPU scaling, enabled by PCIe peer‑to‑peer transfers, expands KV‑cache capacity by up to 1.6×, allowing larger context windows for generative AI workloads without sacrificing latency. This holistic approach addresses the growing demand for privacy‑preserving, on‑premise AI inference, where subscription‑based cloud models are increasingly scrutinized.

For businesses evaluating AI infrastructure, Intel’s combined Xeon‑GPU offering delivers a compelling value proposition. The 1.9× generational performance gain of Xeon 6 CPUs, coupled with AMX and AVX‑512 acceleration, offloads many inference tasks from the GPU, improving overall system efficiency and reducing total cost of ownership. As the only server‑CPU vendor submitting stand‑alone results to MLPerf, Intel reinforces its central role in the AI stack, positioning itself as a viable, open alternative for enterprises seeking performance, scalability, and control over their AI workloads.