Memory System Design for AI/ML & ML/AI for Memory System Design - SRC AIHW Annual Review - 23.07.24
Why It Matters
By slashing data‑movement energy, the new PIM designs enable faster, greener AI inference and training, directly impacting the cost and scalability of future AI hardware deployments.
Key Takeaways
- •Data movement dominates energy use in large AI workloads
- •Processing‑in‑memory (PIM) aims to cut off‑chip traffic
- •MIDM introduces fine‑grain DRAM access and low‑cost interconnects
- •LLVM passes automate SIMD extraction for DRAM‑based kernels
- •Energy efficiency gains reach up to 6.8× versus GPUs
Summary
The SRC AIHW annual review highlighted a critical challenge in modern AI/ML systems: data movement consumes the majority of system energy, especially in large‑scale models running on edge TPUs where over 90% of power is spent on off‑chip interconnects. The task force’s mission is to redesign memory systems that are data‑centric, data‑aware, and capable of handling massive workloads in both AI and genomics, leveraging a tight hardware‑software co‑design loop.
Key progress this year centers on processing‑in‑memory (PIM) strategies, notably the MIDM (Multiple‑Instruction Multiple‑Data in DRAM) framework presented at HPCA. MIDM refines DRAM granularity, adds lightweight inter‑bank communication, and supplies compiler and OS support to map high‑level kernels onto DRAM instructions. By segmenting word lines and enabling fine‑grain operations, the approach mitigates under‑utilization, improves SIMD utilization, and supports multi‑programming across DRAM mats.
The team demonstrated substantial performance and energy benefits across benchmarks, reporting up to 6.8× energy improvement over GPUs and 14× over prior SIMD‑based PIM systems. Compiler integration via three new LLVM passes automates vectorization, scheduling, and code generation, reducing programmer effort. Open‑source releases of architectural models and simulation tools further accelerate community adoption.
These advances suggest a shift toward memory‑centric AI architectures, where smarter memory subsystems alleviate bandwidth bottlenecks and lower power budgets. For industry, the work promises more sustainable, high‑performance AI accelerators and opens pathways for collaborations with Intel, AMD, IBM, and Qualcomm.
Comments
Want to join the conversation?
Loading comments...