Fourth Data Prefetching Championship: Part I

•April 27, 2026

SIGARCH Blog (ACM)•Apr 27, 2026

Key Takeaways

•VIP adds cross-page prefetches at L1, boosting AI workload performance
•SPPAM merges AMPM and SPP, improving L2 prefetch accuracy and throttling
•Emender introduces confidence sorting, cuckoo filter, and fairness throttling to curb over-prefetching
•sBerti adds smart stride engine across page boundaries, cutting cold-start misses

Pulse Analysis

The fourth Data Prefetching Championship (DPC‑4) ran alongside HPCA 2026, drawing teams from academia and industry to push the limits of hardware prefetching under strict storage budgets (32 KB L1D, 128 KB L2, 256 KB LLC). Two thought‑leading keynotes framed the contest: Leeor Peled of Huawei urged researchers to prioritize novel concepts such as semantic or neural‑network‑driven prefetchers, while Google’s Akanksha J. highlighted the mismatch between traditional hard‑wired heuristics and the heterogeneous, multi‑tenant workloads of modern datacenters.

The stage was set for inventive designs that balance aggressiveness with fairness. Among the winning entries, the Virtual Inter‑Page (VIP) prefetcher demonstrated that a lightweight L1‑level stride engine can safely issue cross‑page requests to the L2 cache, delivering noticeable gains for AI workloads that stream large structures across 4 KB boundaries. The Signature Pattern Prediction and Access‑Map (SPPAM) prefetcher fused two state‑of‑the‑art ideas—AMPM’s out‑of‑order resilience and SPP’s speculative pattern matching—into a unified L2 mechanism that adapts confidence thresholds and throttles traffic based on DRAM bandwidth, thereby improving both SPEC and cloud benchmarks. Emender and sBerti tackled the over‑prefetching problem that plagues the high‑performing VBerti + Pythia baseline.

Emender’s confidence‑sorted pending buffer, cuckoo filter for duplicate elimination, and fairness‑aware throttling cut useless traffic and free prefetch queue space, especially in multi‑core scenarios. sBerti introduced a smart stride engine that operates in virtual address space, enabling cross‑page prefetches and dynamic look‑ahead for AI/ML traces. Together these solutions illustrate a shift toward software‑defined, adaptive prefetching—a trend that could reshape memory subsystems in cloud servers and edge devices alike.

Fourth Data Prefetching Championship: Part I

Read Original Article

Comments

Want to join the conversation?

Fourth Data Prefetching Championship: Part I

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

Hardware Pulse