In recent years, HPC systems have become increasingly complex and heterogeneous, making application development and optimization challenging. To this respect, intuitive performance models like the Cache-aware Roofline Model (CARM) offer effective guidance by providing insights into bottlenecks that limit the application’s ability to reach the system’s maximum performance. The current landscape of CARM-enabled tools covers either vendor-specific (Intel Advisor), not sufficiently developed (AMD) or simply non-existing (ARM, RISC-V) tools. The focus of this talk is the CARM Tool, which was developed to address this problem, by extending CARM support to all major CPU architectures and ISAs, i.e., x86 (Intel, AMD), ARM, and RISC-V. The proposed tool includes automatically generated assembly microbenchmarks, specifically tailored to cover a full performance spectrum of modern CPUs (from scalar to all supported vector ISA extensions) for both computational units and all memory hierarchy levels. The tool also provides application profiling capabilities in the scope of the CARM, to facilitate application optimizations using the CARM’s insight.
José P. Morgado (Tue,) studied this question.