Digital Design & Comp. Arch: L20b: GPU Programming (Spring 2026)
Why It Matters
Understanding GPU programming and tensor‑core optimization directly translates into faster AI model training and lower infrastructure costs, giving businesses a competitive edge in data‑intensive markets.
Key Takeaways
- •GPUs evolved from graphics to dominant general‑purpose accelerators.
- •CUDA’s bulk‑synchronous model organizes threads into blocks and warps.
- •Tensor cores enable mixed‑precision matrix multiplication for deep learning.
- •Memory hierarchy (registers, L1/L2, global) drives performance optimization.
- •Emerging research explores tensor cores for sparse and non‑ML workloads.
Summary
The lecture introduces GPU programming as a cornerstone of modern high‑performance computing, shifting focus from traditional graphics rendering to general‑purpose acceleration. It outlines the CUDA and OpenCL ecosystems, emphasizing the bulk‑synchronous parallel model that structures code into thread blocks, warps, and SIMD lanes. Key technical insights include the evolution from Nvidia’s early Tesla architecture—240 stream processors—to the Volta V100 with 5,120 processors and dedicated tensor cores. The speaker explains SIMT execution, the hierarchy of memory (registers, L1/L2 caches, global DRAM), and how fine‑grained multithreading and warp scheduling affect throughput. Illustrative examples compare a 2009 GTX 285 to the 2017 V100, highlighting a thirty‑fold increase in peak throughput and bandwidth approaching 900 GB/s. Tensor cores perform mixed‑precision 4×4 matrix‑multiply‑accumulate operations, enabling rapid deep‑learning training by mapping convolutions to matrix multiplications. The discussion underscores that mastering GPU memory management and exploiting tensor cores are essential for AI developers and enterprises seeking cost‑effective scaling. It also hints at future directions, such as using tensor cores for sparse workloads and integrating GPUs with other accelerators like systolic arrays for broader computational workloads.
Comments
Want to join the conversation?
Loading comments...