August 8, 2024Open Access

FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications

Key Points

Key points are not available for this paper at this time.

Abstract

Half-precision hardware support is now almost ubiquitous. In contrast to its active use in AI, half-precision is less commonly employed in scientific and engineering computing. The valuable proposition of accelerating scientific computing applications using half-precision prompted this study. Focusing on solving sparse linear systems in scientific computing, we explore the technique of utilizing FP16 in multigrid preconditioners. Based on observations of sparse matrix formats, numerical features of scientific applications, and the performance characteristics of multigrid, this study formulates four guidelines for FP16 utilization in multigrid. The proposed algorithm demonstrates how to avoid FP16 overflow through scaling. A setup-then-scale strategy prevents FP16's limited accuracy and narrow range from interfering with the multigrid's numerical properties. Another strategy, recover-and-rescale on the fly, reduces the memory footprint of hotspot kernels. The extra precision-conversion overhead in mix-precision kernels is addressed by the transformation of storage formats and SIMD implementation. Two ablation experiments validate the effectiveness of our algorithm and parallel kernel implementation on ARM and X86 architectures. We further evaluate three idealized and five real-world problems to demonstrate the advantage of utilizing FP16 in a multigrid preconditioner. The average speedups are approximately 2.75x and 1.95x in preconditioner and end-to-end workflow, respectively.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper

Cite This Study

Zong et al. (Thu,) studied this question.

synapsesocial.com/papers/68e5cfffb6db6435875664f1 https://doi.org/https://doi.org/10.1145/3673038.3673040

AIに質問

Bookmark

View Full Paper