Seminar in Comp. Arch. - S3: Conduit (Spring 2026)
Why It Matters
Conduit makes SSD‑based near‑data processing accessible without code changes, delivering significant performance and energy gains that could shift compute workloads closer to storage.
Key Takeaways
- •Conduit offers programmer‑transparent NDP across SSD cores, DRAM, and flash.
- •Two‑step workflow: compile‑time vectorization then runtime cost‑based offloading.
- •Evaluations show average 1.8× speedup and 46% energy reduction.
- •Combining ISP and IFP yields to 40% performance gain for hybrid workloads.
- •Offloading decisions use latency factors: movement, compute, dependence, and queuing.
Summary
The seminar introduced Conduit, a programmer‑transparent near‑data processing (NDP) framework that leverages the heterogeneous compute resources inside modern SSDs—embedded cores, DRAM, and flash chips. By abstracting offloading decisions away from developers, Conduit aims to overcome the adoption barriers of prior SSD‑based NDP techniques, which required manual code partitioning and were limited to narrow workloads.
Conduit operates in two stages. At compile time, a custom pass auto‑vectorizes loops, converting scalar code into wide vector operations that match the SSD’s internal parallelism and extracts lightweight metadata for fast scheduling. At runtime, the SSD controller evaluates a cost function that accounts for data‑movement latency, expected compute latency, data‑dependence delay, and resource queuing, then dispatches each vector instruction to the most suitable resource (ISP, DRAM‑based PIM, or in‑flash processing). Simulations using MQSim across six workloads showed an average 1.8× speedup and 46% energy savings versus the best prior offloading policy.
The presenter highlighted that a naïve combination of resources can hurt performance due to inter‑resource data movement, but a judicious mix—specifically pairing ISP for compute‑intensive kernels with IFP for I/O‑bound phases—delivered up to 40% additional speedup on hybrid workloads. The framework also supports dynamic ISA translation, converting vector instructions to ARM for ISP, SIMD‑RAM extensions for DRAM, or flash‑specific primitives for IFP.
By automating fine‑grained offloading and exploiting all SSD compute tiers, Conduit promises to make NDP a practical acceleration layer for data‑intensive applications, potentially reshaping storage‑centric system design and reducing reliance on host CPUs for bandwidth‑heavy tasks.
Comments
Want to join the conversation?
Loading comments...