GPU Acceleration Achieves 40 Speedup for Selected Basis Diagonalization with Thrust

•January 28, 2026

Quantum Zeitgeist•Jan 28, 2026

Why It Matters

Accelerating the classical diagonalisation step enables larger, faster hybrid quantum‑classical simulations, expanding the practical reach of quantum chemistry and materials research.

Key Takeaways

•GPU‑native SBD yields up to 40× CPU speedup
•Supports half‑ and full‑bitstring representations in unified framework
•Matrix‑free design handles 10⁸–10¹⁰ determinants efficiently
•Thrust library provides fine‑grained data‑parallel control
•Scales across multiple GPU nodes, enabling chemical‑scale SQD

Pulse Analysis

The Sample‑based Diagonalisation (SQD) framework underpins many hybrid quantum‑classical algorithms, but its classical diagonalisation step—Selected Basis Diagonalisation (SBD)—has long limited scalability. As quantum processors grow, the number of determinants in the reduced basis can reach 10⁸ to 10¹⁰, making matrix‑free Hamiltonian application a memory‑intensive challenge. Traditional CPU‑bound implementations struggle to keep pace, causing SQD iterations to dominate overall runtime. Overcoming this bottleneck is essential for extending quantum chemistry simulations to chemically relevant system sizes and for maintaining competitive time‑to‑solution in emerging quantum‑accelerated workflows. The research team leveraged NVIDIA’s Thrust library to rewrite SBD as a fully GPU‑resident kernel suite. By flattening configuration data structures, restructuring excitation evaluation, and exploiting fine‑grained data‑parallel primitives, the implementation avoids explicit Hamiltonian matrix construction and minimizes host‑device transfers. Benchmarks on the Miyabi‑G GH200 cluster show per‑node speedups of 35–39×, with a peak of 40× over optimized CPU code, even when scaling from one to sixteen nodes. The approach supports both half‑bitstring and full‑bitstring representations, delivering a portable, high‑performance backend that can adapt to future GPU architectures. These gains translate directly into faster quantum‑chemical calculations, allowing SQD to tackle problems such as the Fe₄S₄ cluster within current GPU memory limits. The matrix‑free, GPU‑native strategy also opens the door for similar accelerations in other quantum‑classical algorithms, including variational quantum eigensolvers that rely on iterative diagonalisation. As HPC centers increasingly deploy GPU‑centric systems, the Thrust‑based SBD provides a scalable foundation for next‑generation quantum simulation pipelines, shortening development cycles and expanding the feasible problem space for researchers worldwide.