P&S Arch. & Algo. For Health & Life Sciences - L6: Overview of Genomic Workflows (II) (Spr 2026)

Onur Mutlu Lectures
Onur Mutlu LecturesApr 28, 2026

Why It Matters

Efficient read‑mapping algorithms cut sequencing analysis time and cost, enabling faster, more affordable genomic insights critical for clinical and research breakthroughs.

Key Takeaways

  • Read mapping transforms fragmented reads into a reconstructed genome.
  • Short reads offer accuracy; long reads provide coverage but higher error rates.
  • Indexing reference genomes with k‑mer seeds enables fast alignment.
  • Minimizer and spaced‑seed techniques balance memory use and sensitivity.
  • Advanced seed algorithms improve fuzzy matching without excessive storage.

Summary

The sixth lecture of the P&S Architecture & Algorithms for Health & Life Sciences series dives into genomic workflow analysis, concentrating on the read‑mapping stage that stitches sequenced fragments into a complete genome. It revisits earlier concepts—why genomics matters, base‑calling, and data digitization—before moving into the computational challenges of aligning millions of short and long reads to a reference.

The presenter explains the fundamental trade‑off between short reads, which are highly accurate but limited in length, and long reads, which span larger regions yet carry higher error rates. Efficient mapping relies on indexing the reference genome with k‑mer seeds, enabling rapid lookup rather than exhaustive sliding‑window searches. Various seed‑selection strategies—full k‑mer tables, minimizers, spaced‑seeds, linked k‑mers, and quasi‑seeds—are compared for their impact on memory footprint, sensitivity, and flexibility.

Illustrative analogies liken the process to solving a puzzle with or without a picture, highlighting how minimizer selection (choosing the smallest hash in a window) reduces storage while preserving most matches. The lecture cites the “blend” paper on fuzzy seed matching as an example of research that achieves high sensitivity without full‑k‑mer indexing, and demonstrates how space‑seed designs can capture similar sequences despite mismatches.

These algorithmic advances directly affect the scalability of genomic pipelines in health and life‑science applications. By optimizing the balance between speed, memory, and alignment accuracy, organizations can lower computational costs, accelerate diagnostic sequencing, and support larger population‑scale studies, ultimately advancing personalized medicine initiatives.

Original Description

Project & Seminar (P&S), ETH Zürich, Spring 2026
Lecture 6: Overview of Genomic Workflows (II)
Lecturer: Nika Mansouri Ghiasi
Date: April 29, 2026
Slides (pptx):
Slides (pdf):
Recommended Reading:
====================
A Modern Primer on Processing in Memory
Memory-Centric Computing: Solving Computing's Memory Problem
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Intelligent Architectures for Intelligent Computing Systems
RowHammer: A Retrospective
Fundamentally Understanding and Solving RowHammer
Accelerating Genome Analysis via Algorithm-Architecture Co-Design
From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures
RECOMMENDED LECTURE VIDEOS & PLAYLISTS:
========================================
Digital Design and Computer Architecture Spring 2025 Livestream Lectures Playlist:
Fundamentals of Computer Architecture Fall 2025 Livestream Lectures Playlist:
Seminar in Computer Architecture Spring 2025 Livestream Lectures Playlist:
Computer Architecture Fall 2024 Lectures Playlist:
Interview with Professor Onur Mutlu:
TCuARCH meets Prof. Onur Mutlu
Arch. Mentoring Workshop @ISCA'21 - Doing Impactful Research
The Story of RowHammer Lecture:
Accelerating Genome Analysis Lecture:
Memory-Centric Computing Systems Tutorial at IEDM 2021:
Intelligent Architectures for Intelligent Machines Lecture:
Featured Lectures:

Comments

Want to join the conversation?

Loading comments...