Memory System Design for AI/ML & ML/AI for Memory System Design - SRC AIHW Annual Review - 23.07.24

Onur Mutlu Lectures
Onur Mutlu LecturesMar 18, 2026

Why It Matters

By slashing data‑movement energy, the new PIM designs enable faster, greener AI inference and training, directly impacting the cost and scalability of future AI hardware deployments.

Key Takeaways

  • Data movement dominates energy use in large AI workloads
  • Processing‑in‑memory (PIM) aims to cut off‑chip traffic
  • MIDM introduces fine‑grain DRAM access and low‑cost interconnects
  • LLVM passes automate SIMD extraction for DRAM‑based kernels
  • Energy efficiency gains reach up to 6.8× versus GPUs

Summary

The SRC AIHW annual review highlighted a critical challenge in modern AI/ML systems: data movement consumes the majority of system energy, especially in large‑scale models running on edge TPUs where over 90% of power is spent on off‑chip interconnects. The task force’s mission is to redesign memory systems that are data‑centric, data‑aware, and capable of handling massive workloads in both AI and genomics, leveraging a tight hardware‑software co‑design loop.

Key progress this year centers on processing‑in‑memory (PIM) strategies, notably the MIDM (Multiple‑Instruction Multiple‑Data in DRAM) framework presented at HPCA. MIDM refines DRAM granularity, adds lightweight inter‑bank communication, and supplies compiler and OS support to map high‑level kernels onto DRAM instructions. By segmenting word lines and enabling fine‑grain operations, the approach mitigates under‑utilization, improves SIMD utilization, and supports multi‑programming across DRAM mats.

The team demonstrated substantial performance and energy benefits across benchmarks, reporting up to 6.8× energy improvement over GPUs and 14× over prior SIMD‑based PIM systems. Compiler integration via three new LLVM passes automates vectorization, scheduling, and code generation, reducing programmer effort. Open‑source releases of architectural models and simulation tools further accelerate community adoption.

These advances suggest a shift toward memory‑centric AI architectures, where smarter memory subsystems alleviate bandwidth bottlenecks and lower power budgets. For industry, the work promises more sustainable, high‑performance AI accelerators and opens pathways for collaborations with Intel, AMD, IBM, and Qualcomm.

Original Description

Title: Memory System Design for AI/ML & ML/AI for Memory System Design
Presenter: Professor Onur Mutlu (https://people.inf.ethz.ch/omutlu/)
SRC AIHW Annual Review
Date: July 23, 2024
Slides (pptx):
Slides (pdf):
Recommended Reading:
====================
A Modern Primer on Processing in Memory
Intelligent Architectures for Intelligent Computing Systems
RowHammer: A Retrospective
Fundamentally Understanding and Solving RowHammer
RECOMMENDED LECTURE VIDEOS & PLAYLISTS:
========================================
Computer Architecture Fall 2021 Lectures Playlist:
Digital Design and Computer Architecture Spring 2021 Livestream Lectures Playlist:
Featured Lectures:
Interview with Professor Onur Mutlu:
The Story of RowHammer Lecture:
Accelerating Genome Analysis Lecture:
Memory-Centric Computing Systems Tutorial at IEDM 2021:
Intelligent Architectures for Intelligent Machines Lecture:
Computer Architecture Fall 2020 Lectures Playlist:
Digital Design and Computer Architecture Spring 2020 Lectures Playlist:
Public Lectures by Onur Mutlu, Playlist:
Computer Architecture at Carnegie Mellon Spring 2015 Lectures Playlist:
Rethinking Memory System Design Lecture @stanfordonline :

Comments

Want to join the conversation?

Loading comments...