Key points are not available for this paper at this time.
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated for feature sizes of 0:8m, 0:35m, and 0:18m. Performance results and trends are expressed in terms of issue width and window size. Our analysis in-dicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of de-pendent instructions into queues, and issues instructions from mul-tiple queues in parallel. Simulation shows little slowdown as com-pared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simpli-fied and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the pro-posed microarchitecture will help minimize performance degrada-tion due to slow bypasses in future wide-issue machines. 1
Palacharla et al. (Thu,) studied this question.