May 1, 2012

NUMA Aware Iterative Stencil Computations on Many-Core Systems

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Temporal blocking in iterative stencil computations allows to surpass the performance of peak system bandwidth that holds for a single stencil computation. However, the effectiveness of temporal blocking depends strongly on the tiling scheme, which must account for the contradicting goals of spatio-temporal data locality, regular memory access patterns, parallelization into many independent tasks, and datato-core affinity for NUMA-aware data distribution. Despite the prevalence of cache coherent non-uniform memory access (ccNUMA) in todays many-core systems, this latter aspect has been largely ignored in the development of temporal blocking algorithms. Building upon previous cache-aware 1 and cacheoblivious 2 schemes, this paper develops their NUMA-aware variants, explaining why the incorporation of data-to-core affinity as an equally important goal necessitates also new tiling and parallelization strategies. Results are presented on an 8 socket dual-core and a 4 socket oct-core systems and compared against an optimized naive scheme, various peak performance characteristics, and related schemes from literature.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Shaheen et al. (Tue,) studied this question.

synapsesocial.com/papers/6a1c1595bc71fb1015a93a1c https://doi.org/https://doi.org/10.1109/ipdps.2012.50

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar