(PR) AMD Instinct MI355X GPUs Surpass 1M Tokens/Sec in MLPerf 6.0
Key Takeaways
- •MI355X exceeds 1M tokens/sec on Llama 2 70B
- •3.1× performance uplift vs MI325X
- •Multinode scaling maintains >90% efficiency across 11‑12 nodes
- •Partners reproduced results within 4%, proving ecosystem reproducibility
- •ROCm optimizations enable FP4/FP6 inference for large models
Summary
AMD announced that its Instinct MI355X GPUs have broken the 1 million‑tokens‑per‑second barrier in the MLPerf Inference 6.0 benchmark, delivering up to 3.1× higher throughput than the prior MI325X. The GPUs, built on the 3 nm CDNA 4 architecture with FP4/FP6 support and up to 288 GB HBM3E, achieved competitive single‑node performance against NVIDIA’s B200 and B300 while maintaining over 90% efficiency in multinode clusters of up to 12 nodes. A broad partner ecosystem reproduced the results within 4%, confirming the stack’s reproducibility across diverse systems. AMD attributes the gains to ROCm software optimizations and positions the MI355X as a foundation for its upcoming MI400 series and Helios rack‑scale solutions.
Pulse Analysis
The generative‑AI market is increasingly judged by how many tokens a system can serve per second, especially when workloads run across clusters. AMD’s MI355X, built on a 3 nm CDNA 4 die with 185 billion transistors, pushes the envelope by crossing the one‑million‑tokens‑per‑second threshold on Llama 2 70B and GPT‑OSS‑120B. This achievement signals that AMD can now compete on the same throughput metrics that have traditionally favored NVIDIA, while also offering FP4/FP6 precision that reduces compute cost for large language models.
Beyond raw numbers, the MI355X’s success rests on a cohesive hardware‑software stack. ROCm’s low‑level optimizations for FP4/FP6 kernels, combined with efficient GPU‑to‑GPU communication, enable both single‑node competitiveness and predictable scale‑out. The fact that nine independent partners—ranging from Dell to Red Hat—reproduced AMD’s results within a 4% margin underscores the robustness of the ecosystem and reduces the risk for enterprises adopting the platform. Heterogeneous deployments that blend MI300X, MI325X, and MI355X across geographic locations further demonstrate flexibility for phased upgrades.
Looking ahead, AMD’s annual cadence promises the MI400 series on CDNA 5 and the Helios rack‑scale solution, which could deepen the performance gap and lower total cost of ownership for large‑scale inference. As enterprises move from pilot projects to production‑grade AI services, the ability to sustain high token throughput, maintain efficiency at scale, and quickly bring new models online will be decisive factors. AMD’s recent benchmark results position it as a credible challenger in the inference market, potentially reshaping vendor dynamics and offering customers more choice in building future‑ready AI infrastructure.
(PR) AMD Instinct MI355X GPUs Surpass 1M Tokens/Sec in MLPerf 6.0
Comments
Want to join the conversation?