Understanding & Designing Modern Storage Systems - M5: Processing Inside NAND Flash Memory

Onur Mutlu Lectures
Onur Mutlu LecturesMar 26, 2026

Why It Matters

FlashCosmos cuts data‑movement costs and improves energy efficiency, allowing data‑center operators to run large‑scale analytics directly in storage without sacrificing reliability.

Key Takeaways

  • Multi-word line sensing enables single‑read bulk bitwise ops.
  • FlashCosmos improves performance and energy efficiency over prior IFP.
  • Enhanced SLC programming increases voltage margin for reliable computation.
  • Evaluated on 160 real 3D NAND chips across three workloads.
  • Reduces data movement bottlenecks from storage to compute units.

Summary

The video introduces FlashCosmos, a new in‑flash processing technique that performs bulk bitwise operations directly inside NAND flash memory. Presented as part of a recent MICRO 2022 paper, the work targets the growing data‑movement bottleneck that hampers databases, graph analytics, cryptography and other data‑intensive workloads.

Conventional systems move data from storage to CPUs or GPUs, limited by PCIe‑Gen4’s ~8 GB/s external bandwidth, while near‑data and in‑storage processing still suffer from internal channel limits (~9.6 Gb/s) and serial sensing of operands. FlashCosmos replaces serial reads with a multi‑word‑line sensing (MWS) scheme that activates several word lines simultaneously, delivering a single‑read AND/OR operation. An enhanced SLC programming mode widens the voltage margin between erased and programmed states, boosting computational reliability.

The authors demonstrate the concept on 160 real 3D‑NAND chips and run system‑level simulations using a state‑of‑the‑art SSD simulator on three real‑world workloads. Results show up to 2‑3× speedup and comparable energy reductions versus the best prior in‑flash processing designs, while maintaining low raw bit‑error rates thanks to the larger voltage margin.

By eliminating repeated reads and reducing data transfers, FlashCosmos promises to reshape storage‑centric architectures, enabling faster, greener processing for workloads that exceed DRAM capacity. Its reliability improvements also make in‑flash compute viable for production environments, potentially accelerating the adoption of computational storage solutions.

Original Description

Project and Seminars Course: Understanding and Designing Modern Storage Systems, ETH Zürich, Spring 2026
Lecture 5: Processing Inside NAND Flash Memory
Lecturer: Rakesh Nadig and Dr. Mohammad Sadrosadati
Date: March 27, 2026
Recommended Reading:
====================
A Modern Primer on Processing in Memory
Memory-Centric Computing: Solving Computing's Memory Problem
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Intelligent Architectures for Intelligent Computing Systems
RowHammer: A Retrospective
Fundamentally Understanding and Solving RowHammer
Accelerating Genome Analysis via Algorithm-Architecture Co-Design
From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures
RECOMMENDED LECTURE VIDEOS & PLAYLISTS:
========================================
Digital Design and Computer Architecture Spring 2025 Livestream Lectures Playlist:
Fundamentals of Computer Architecture Fall 2025 Livestream Lectures Playlist:
Seminar in Computer Architecture Spring 2025 Livestream Lectures Playlist:
Computer Architecture Fall 2024 Lectures Playlist:
Interview with Professor Onur Mutlu:
TCuARCH meets Prof. Onur Mutlu
Arch. Mentoring Workshop @ISCA'21 - Doing Impactful Research
The Story of RowHammer Lecture:
Accelerating Genome Analysis Lecture:
Memory-Centric Computing Systems Tutorial at IEDM 2021:
Intelligent Architectures for Intelligent Machines Lecture:
Featured Lectures:

Comments

Want to join the conversation?

Loading comments...