March 3, 2026Open Access

Microarchitectural comparison, in-core modeling, and memory hierarchy analysis of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa

Key Points

The Grace Superchip demonstrates a near-optimal implementation of write-allocate evasion, improving memory efficiency.
Performance models were created using the OSACA tool, comparing accuracy with llvm-mca for reliable predictions.
A thorough assessment of cache behavior shows overlapping cache hierarchies between the AMD Genoa and Grace CPUs.
Findings may influence future CPU design choices and computational efficiency in high-performance environments.

Abstract

Three big semiconductor companies in HPC are currently competing in the race for the best CPU: AMD, Intel, and NVIDIA. There are significant differences among their state-of-the-art CPU designs, spanning the entire range from instruction execution to cache behavior and main memory bandwidth. In this work, we analyze the performance of CPUs based on the Zen 4, Golden Cove, and Neoverse V2 microarchitectures. We create accurate in-core performance models for use with the Open Source Architecture Code Analyzer (OSACA) tool and compare its prediction accuracy with llvm-mca. Beyond the tool aspect, this reveals interesting differences in in-core design points but also some commonalities. Beyond the single core, we extend our comparison by measuring data-transfer behavior through the memory hierarchy using a variety of microbenchmarks. We thoroughly investigate the “write-allocate (WA) evasion” feature, which can automatically reduce the memory traffic caused by write misses. We show that the Grace Superchip has a next-to-optimal implementation of WA evasion while the Sapphire Rapids CPU can avoid write allocates completely only in specific scenarios. The only way to eliminate WAs on AMD Genoa is the explicit use of non-temporal stores. Finally, we study the cache hierarchy of the CPUs in view of the Execution-Cache-Memory (ECM) performance model, revealing overlapping cache hierarchies on Genoa and Grace in contrast to Sapphire Rapids.

Microarchitectural comparison, in-core modeling, and memory hierarchy analysis of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa

Key Points

Abstract

Cite This Study