Key points are not available for this paper at this time.
We consider the floating point microarchitecture support in high-end RISC superscalar processors. We propose a simple, yet effective bounds model to deduce the "bestcase" loop performance limits for these processors. We compare these bounds to simulation-based (and where available, hardware-based) performance measurements for actual compiler-generated code sequences. From this study, we identify loop tuning opportunities to bridge the gap between "best-case" and "actual" performance in a post-silicon setting. Some of the results of such analysis point to fundamental hardware performance bugs which may be removed through feasible microarchitectural changes. More frequently, the analysis is useful for suggesting compiler enhancements. The analysis methods described have been used in actual high-end processor development projects within our company. We report our experimental results in the context of a set of application-based loop test cases, designed to stress various resource limits in the core (infinite cache) microarchitecture.
Bose et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: