What question did this study set out to answer?

The research aims to enhance prefetching efficiency by addressing limitations in current memory access pattern detection methods.

April 5, 2026Open Access

Thoth: Uncovering Data-Dependent Memory Access Patterns via Annotation-Directed Load Sampling

Key Points

The research aims to enhance prefetching efficiency by addressing limitations in current memory access pattern detection methods.
Develop a hardware prefetcher, Thoth, focused on producer-consumer load pairs.
Utilize register-level dependency tracking to detect load instances.
Implement annotation-directed load sampling for robust pattern identification.
Employ precise load annotation to maintain correctness across pipeline flushes.
Achieved a 51.1% speedup over a no-prefetching baseline.
Outperformed two state-of-the-art DDMA prefetchers by 14.7% and 8.2%, respectively.

Abstract

Sparse data structures are ubiquitous in graph analytics, machine learning, and high-performance computing. Algorithms operating on these structures typically exhibit highly irregular data-dependent memory access (DDMA) patterns, leading to frequent cache misses and degraded memory performance. Prior work on hardware prefetching to mitigate DDMA-induced misses falls into two categories: address-based methods that sample correlated sequences of load data and addresses from cache miss streams, and instruction-based methods that record instruction-level dependency chains. Although both learn single relations effectively, they struggle with multi-level range relations prevalent in DDMA-intensive workloads, leaving substantial prefetching opportunities unexploited. In address-based schemes, misses from deeper-level consumers are often miscorrelated with the producer. Moreover, out-of-order execution and the range relations themselves perturb sampling, yielding mismatched load instances. In instruction-based schemes, chain-structured representations and suboptimal learning strategies prevent the construction of complete dependency chains for these relations. To overcome these limitations, we present Thoth, a hardware prefetcher that operates at the granularity of explicit producer-consumer load pairs rather than constructing dependency chains. Thoth detects such pairs via register-level dependency tracking. It adopts an annotation-directed load sampling strategy that annotates matched producer-consumer load instances and samples only those annotated instances, thereby robustly uncovering DDMA patterns—including multi-level range relations—while avoiding mismatches. To maintain annotation correctness across pipeline flushes, Thoth employs precise load annotation, which leverages reorder identifiers to resume or terminate annotation precisely. On a suite of DDMA-intensive benchmarks, Thoth delivers a 51.1% speedup over a no-prefetching baseline and outperforms two state-of-the-art DDMA prefetchers by 14.7% and 8.2%, respectively.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Jiang et al. (Fri,) studied this question.

synapsesocial.com/papers/69d1fdbfa79560c99a0a3f72 https://doi.org/https://doi.org/10.1145/3806835

Bookmark

View Full Paper