Hardware Videos

All News Deals Social Blogs Videos Podcasts Digests

Hardware Semiconductors

Digital Design & Computer Architecture D10: Problem-Solving Session 10 (Spring 2026)

•May 11, 2026

Onur Mutlu Lectures

Onur Mutlu Lectures•May 11, 2026

Why It Matters

Understanding VLIW’s dependency‑driven bundling and systolic arrays’ regular‑pattern constraints helps engineers design compilers and workloads that fully exploit parallel hardware, directly impacting performance and energy efficiency.

Key Takeaways

•VLIW relies on compiler to bundle independent instructions for parallel execution.
•Ideal IPC equals bundle width, but stalls reduce actual throughput.
•Systolic arrays excel with regular, repetitive data‑flow patterns.
•Scheduling exercise shows poor utilization due to instruction dependencies.
•Execution cycles scale linearly with loop count, highlighting bottlenecks.

Summary

The session opened with a dual focus on VLIW (referred to as VIW) architectures and systolic‑array designs, providing a quick theoretical refresher before diving into hands‑on exercises. The instructor emphasized that VLIW’s performance hinges on the compiler’s ability to group independent instructions into bundles, allowing the hardware to execute them in lockstep without runtime dependency checks. Ideal instructions‑per‑cycle (IPC) matches the bundle width—two for a two‑wide bundle—but any stall in one instruction forces the entire bundle to wait, reducing real‑world IPC.

Key insights included the contrast between VLIW’s static scheduling and traditional out‑of‑order execution, as well as the core principles of systolic arrays: multiple processing elements transform data streams in a regular, weight‑driven pattern. The lecturer illustrated how systolic designs thrive on uniform compute and memory access patterns, yet become inefficient when faced with irregular code structures. The example of a multiply‑accumulate pipeline highlighted the need for tightly coupled data flow to achieve high throughput.

During the exercise, students mapped a seven‑instruction loop (loads, add, multiply, store, branch) onto a VLIW processor with three load units, one store, add, multiply, and branch unit. The optimal schedule packed the first three independent loads into a single V1 bundle, followed by isolated multiply, add, store, and branch bundles, leaving many no‑ops. The resulting useful‑operation‑to‑bundle ratio was 7/5, and total execution cycles grew linearly with the loop counter n, exposing the under‑utilization caused by instruction dependencies.

The discussion underscored that VLIW efficiency is a compiler‑driven problem: without careful dependency analysis, hardware resources remain idle. Likewise, systolic arrays demand algorithmic regularity, limiting their applicability. These lessons inform hardware designers and software engineers about the trade‑offs between static scheduling simplicity and dynamic execution flexibility, guiding future processor and accelerator architectures.

Original Description

Digital Design and Computer Architecture, ETH Zürich, Spring 2026 (https://safari.ethz.ch/ddca/spring2026/)

D10: Problem-Solving Session 10

Lecturer: Prof. Onur Mutlu

Date: 11 May 2026

Recommended Reading:

====================

A Modern Primer on Processing in Memory

https://arxiv.org/pdf/2012.03112.pdf

Memory-Centric Computing: Solving Computing's Memory Problem

https://www.arxiv.org/pdf/2505.00458

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

https://arxiv.org/pdf/2412.19275

Intelligent Architectures for Intelligent Computing Systems

https://people.inf.ethz.ch/omutlu/pub/intelligent-architectures-for-intelligent-computingsystems-invited_paper_DATE21.pdf

RowHammer: A Retrospective

https://people.inf.ethz.ch/omutlu/pub/RowHammer-Retrospective_ieee_tcad19.pdf

Fundamentally Understanding and Solving RowHammer

https://arxiv.org/pdf/2211.07613.pdf

Accelerating Genome Analysis via Algorithm-Architecture Co-Design

https://people.inf.ethz.ch/omutlu/pub/AcceleratingGenomeAnalysis_dac23.pdf

From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures

https://people.inf.ethz.ch/omutlu/pub/IntelligentGenomeAnalysis_csbj22.pdf

RECOMMENDED LECTURE VIDEOS & PLAYLISTS:

========================================

Digital Design and Computer Architecture Spring 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=ubhxKNlOlRg&list=PL5Q2soXY2Zi9Eo29LMgKVcaydS7V1zZW3&index=3

Fundamentals of Computer Architecture Fall 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=uKgMFj1eQQc&list=PL5Q2soXY2Zi_ZMtqz1r-GHm-zzuE1QfIg&index=2

Seminar in Computer Architecture Spring 2025 Livestream Lectures Playlist:

https://www.youtube.com/watch?v=rqeKNZrLzng&list=PL5Q2soXY2Zi-oIW66TLOjtiqQxlDwNHng&index=2

Computer Architecture Fall 2024 Lectures Playlist:

https://www.youtube.com/watch?v=ziMRjDlLEwo&list=PL5Q2soXY2Zi-LfDdGgWyLcTSqzm6a26wD&index=2

Interview with Professor Onur Mutlu:

https://www.youtube.com/watch?v=8ffSEKZhmvo&list=PL5Q2soXY2Zi8VrmOTz44l2WupethSdh-M&index=9

TCuARCH meets Prof. Onur Mutlu

https://www.youtube.com/watch?v=6Hpn4SAX0dI

Arch. Mentoring Workshop @ISCA'21 - Doing Impactful Research

https://www.youtube.com/watch?v=83tlorht7Mc

The Story of RowHammer Lecture:

https://www.youtube.com/watch?v=sgd7PHQQ1AI&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=39

Accelerating Genome Analysis Lecture:

https://www.youtube.com/watch?v=r7sn41lH-4A&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=41

Memory-Centric Computing Systems Tutorial at IEDM 2021:

https://www.youtube.com/watch?v=H3sEaINPBOE&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=35

Intelligent Architectures for Intelligent Machines Lecture:

https://www.youtube.com/watch?v=GTieZPY4Wmc&list=PL5Q2soXY2Zi8D_5MGV6EnXEJHnV2YFBJl&index=38

Featured Lectures:

https://www.youtube.com/watch?v=jVYCchBGNVc&list=PL5Q2soXY2Zi8VrmOTz44l2WupethSdh-M&index=1

Comments

Want to join the conversation?

Loading comments...