P&S: Arch. & Algo. For Health & Life Sciences- L3: Storage Centric (Meta)Genomics I (Spr 2026)
Why It Matters
Embedding genomics filters in storage cuts data‑movement costs and accelerates analysis, enabling faster, cheaper insights for precision medicine and public‑health applications.
Key Takeaways
- •Storage-centric filters reduce data movement in genomics pipelines.
- •GenStore identifies exact‑match and non‑match reads inside SSDs.
- •Read‑size k‑mers and sorted indexes enable sequential access.
- •In‑storage filtering improves performance, energy efficiency, cost significantly.
- •Approach adapts to varied read lengths and genetic variation.
Summary
The lecture introduces storage‑centric architectures for genomics and metagenomics, focusing on how embedding filtering logic directly inside storage devices can alleviate the massive data‑movement and preparation bottlenecks that dominate current pipelines. By moving simple, low‑cost operations—such as exact‑match detection and non‑match elimination—into SSDs, systems like GenStore aim to send only the reads that truly require expensive alignment to downstream CPUs or accelerators. Key insights include the use of read‑size k‑mers to collapse multiple index lookups into a single operation, and the sorting of both k‑mers and read tables to transform random accesses into sequential scans. Experiments show that an ideal in‑storage filter can dramatically cut both computation and I/O overhead, and that hardware accelerators shift the bottleneck from compute to I/O, underscoring the value of storage‑side processing. The presenter highlights concrete examples: GenStore‑EM filters exact‑matching reads, while GenStore‑NM discards reads with no viable alignment. By leveraging a single index lookup per read and simple comparison logic, the design achieves high throughput with minimal DRAM and flash resources. Real‑world case studies on human and microbial datasets demonstrate performance gains and energy savings without incurring significant hardware cost. Overall, embedding genomics‑specific filters in storage promises faster turnaround for precision‑medicine, outbreak monitoring, and agricultural research, while reducing operational expenses and power consumption—critical factors as sequencing data volumes surge past 100 TB and continue to grow.
Comments
Want to join the conversation?
Loading comments...