The widening performance gap between processor speed and memory access latency has made data locality a critical bottleneck in high-performance computing. In Non-Uniform Memory Access (NUMA) and distributed memory systems, remote accesses incur penalties far greater than local operations, degrading the efficiency of scientific and data-intensive workloads. This paper introduces CacheAware, a compiler–runtime framework for data locality-aware scheduling. CacheAware leverages compiler analysis to annotate tasks with memory access footprints and combines this static information with runtime monitoring of cache miss patterns to guide scheduling and dynamic task migration. Unlike existing NUMA balancing or runtime tasking systems, CacheAware integrates both proactive and reactive strategies to minimize cache thrashing and remote memory fetches. Experimental evaluation on scientific benchmarks demonstrates reductions of up to 30% in cache misses and over 20% improvements in execution time compared to Linux AutoNUMA, NUMA-aware schedulers, and task-based runtimes. These results confirm that CacheAware provides a practical and scalable approach for enhancing data locality and accelerating workloads on modern distributed memory systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haifa A. Alanazi
Northern Border University
Abdulaziz G. Alanazi
Northern Border University
Nasser Albalawi
Northern Border University
Building similarity graph...
Analyzing shared references across papers
Loading...
Alanazi et al. (Tue,) studied this question.
synapsesocial.com/papers/69b25b1996eeacc4fcec97f2 — DOI: https://doi.org/10.3390/computers15030181
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: