Key points are not available for this paper at this time.
This paper proposes a novel design for a low-latency, high-throughput fused floating-point unit (FPU) handling division (DIV) and square-root (SQRT) operations based on the Goldschmidt algorithm. Traditional FPUs in commercial processors suffer from long latency, low throughput, and sub-stantial hardware consumption due to the complexity of DIV and SQRT. In our design, we employ an innovative error analysis method to reduce multiplier bitwidths. Moreover, we elaborately integrate DIV and SQRT to improve resource reuse. Additionally, the pipeline structure ensures multi-precision support and high throughput. We conduct 100 trillion random tests to validate our design, demonstrating its compliance with IEEE 754 single-precision (SP) and double-precision (DP) standards. Results show that our design not only excels existing FPUs in performance but also achieves significant resource reuse for DIV and SQRT operations.
Dai et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: