Hardware Videos

All News Deals Social Blogs Videos Podcasts Digests

Hardware Semiconductors

Digital Design & Comp. Arch: L20b: GPU Programming (Spring 2026)

•May 8, 2026

Onur Mutlu Lectures

Onur Mutlu Lectures•May 8, 2026

Why It Matters

Understanding GPU programming and tensor‑core optimization directly translates into faster AI model training and lower infrastructure costs, giving businesses a competitive edge in data‑intensive markets.

Key Takeaways

•GPUs evolved from graphics to dominant general‑purpose accelerators.
•CUDA’s bulk‑synchronous model organizes threads into blocks and warps.
•Tensor cores enable mixed‑precision matrix multiplication for deep learning.
•Memory hierarchy (registers, L1/L2, global) drives performance optimization.
•Emerging research explores tensor cores for sparse and non‑ML workloads.

Summary

The lecture introduces GPU programming as a cornerstone of modern high‑performance computing, shifting focus from traditional graphics rendering to general‑purpose acceleration. It outlines the CUDA and OpenCL ecosystems, emphasizing the bulk‑synchronous parallel model that structures code into thread blocks, warps, and SIMD lanes. Key technical insights include the evolution from Nvidia’s early Tesla architecture—240 stream processors—to the Volta V100 with 5,120 processors and dedicated tensor cores. The speaker explains SIMT execution, the hierarchy of memory (registers, L1/L2 caches, global DRAM), and how fine‑grained multithreading and warp scheduling affect throughput. Illustrative examples compare a 2009 GTX 285 to the 2017 V100, highlighting a thirty‑fold increase in peak throughput and bandwidth approaching 900 GB/s. Tensor cores perform mixed‑precision 4×4 matrix‑multiply‑accumulate operations, enabling rapid deep‑learning training by mapping convolutions to matrix multiplications. The discussion underscores that mastering GPU memory management and exploiting tensor cores are essential for AI developers and enterprises seeking cost‑effective scaling. It also hints at future directions, such as using tensor cores for sparse workloads and integrating GPUs with other accelerators like systolic arrays for broader computational workloads.

Original Description

Digital Design and Computer Architecture, ETH Zürich, Spring 2026 (https://safari.ethz.ch/ddca/spring2026/)

Lecture 20b: GPU Programming

Lecturer: Dr. Juan Gómez Luna and Prof. Onur Mutlu

Date: 8 May 2026

L20b: GPU Programming

Slides (pptx): https://safari.ethz.ch/ddca/spring2026/lib/exe/fetch.php?media=onur-ddca-2026-lecture20b-gpu-programming-afterlecture.pptx

Slides (pdf): https://safari.ethz.ch/ddca/spring2026/lib/exe/fetch.php?media=onur-ddca-2026-lecture20b-gpu-programming-afterlecture.pdf

Recommended Reading:

====================

A Modern Primer on Processing in Memory

https://arxiv.org/pdf/2012.03112.pdf

Memory-Centric Computing: Solving Computing's Memory Problem

https://www.arxiv.org/pdf/2505.00458

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

https://arxiv.org/pdf/2412.19275

Intelligent Architectures for Intelligent Computing Systems

https://people.inf.ethz.ch/omutlu/pub/intelligent-architectures-for-intelligent-computingsystems-invited_paper_DATE21.pdf

RowHammer: A Retrospective

https://people.inf.ethz.ch/omutlu/pub/RowHammer-Retrospective_ieee_tcad19.pdf

Fundamentally Understanding and Solving RowHammer

https://arxiv.org/pdf/2211.07613.pdf

Accelerating Genome Analysis via Algorithm-Architecture Co-Design

https://people.inf.ethz.ch/omutlu/pub/AcceleratingGenomeAnalysis_dac23.pdf

From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures

https://people.inf.ethz.ch/omutlu/pub/IntelligentGenomeAnalysis_csbj22.pdf

RECOMMENDED LECTURE VIDEOS & PLAYLISTS:

========================================

Digital Design and Computer Architecture Spring 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=ubhxKNlOlRg&list=PL5Q2soXY2Zi9Eo29LMgKVcaydS7V1zZW3&index=3

Fundamentals of Computer Architecture Fall 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=uKgMFj1eQQc&list=PL5Q2soXY2Zi_ZMtqz1r-GHm-zzuE1QfIg&index=2

Seminar in Computer Architecture Spring 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=rqeKNZrLzng&list=PL5Q2soXY2Zi-oIW66TLOjtiqQxlDwNHng&index=2

Computer Architecture Fall 2024 Lectures Playlist:

https://www.youtube.com/watch?v=ziMRjDlLEwo&list=PL5Q2soXY2Zi-LfDdGgWyLcTSqzm6a26wD&index=2

Interview with Professor Onur Mutlu:

https://www.youtube.com/watch?v=8ffSEKZhmvo&list=PL5Q2soXY2Zi8VrmOTz44l2WupethSdh-M&index=9

TCuARCH meets Prof. Onur Mutlu

https://www.youtube.com/watch?v=6Hpn4SAX0dI

Arch. Mentoring Workshop @ISCA'21 - Doing Impactful Research

https://www.youtube.com/watch?v=83tlorht7Mc

The Story of RowHammer Lecture:

https://www.youtube.com/watch?v=sgd7PHQQ1AI&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=39

Accelerating Genome Analysis Lecture:

https://www.youtube.com/watch?v=r7sn41lH-4A&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=41

Memory-Centric Computing Systems Tutorial at IEDM 2021:

https://www.youtube.com/watch?v=H3sEaINPBOE&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=35

Intelligent Architectures for Intelligent Machines Lecture:

https://www.youtube.com/watch?v=GTieZPY4Wmc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=38

Featured Lectures:

https://www.youtube.com/watch?v=jVYCchBGNVc&list=PL5Q2soXY2Zi8VrmOTz44l2WupethSdh-M&index=1

Comments

Want to join the conversation?

Loading comments...