November 27, 2002

Bounds-based loop performance analysis: application to validation and tuning

Key Points

Key points are not available for this paper at this time.

Abstract

We consider the floating point microarchitecture support in high-end RISC superscalar processors. We propose a simple, yet effective bounds model to deduce the "bestcase" loop performance limits for these processors. We compare these bounds to simulation-based (and where available, hardware-based) performance measurements for actual compiler-generated code sequences. From this study, we identify loop tuning opportunities to bridge the gap between "best-case" and "actual" performance in a post-silicon setting. Some of the results of such analysis point to fundamental hardware performance bugs which may be removed through feasible microarchitectural changes. More frequently, the analysis is useful for suggesting compiler enhancements. The analysis methods described have been used in actual high-end processor development projects within our company. We report our experimental results in the context of a set of application-based loop test cases, designed to stress various resource limits in the core (infinite cache) microarchitecture.

AIに質問

Bookmark