Abstract We have optimized the parallel threshold ILU algorithm (ParILUT) for GPUs. The optimizations are for three building blocks: candidate search and ILU residual computation, adding and removing elements, and threshold selection. Firstly, we fuse candidate search and ILU residual computation by modifying the ParILUT algorithm and extending the register-aware SpGEMM algorithm to calculate it. At the same time, we developed a GPU bin search algorithm to make the register-aware SpGEMM algorithm perform better in ParILUT. Secondly, we adopt a warp-row-parallel approach to add elements to new L and U and remove elements from candidates instead of the thread-row-parallel approach. And used the efficient GPU instructions to locate the positions of elements. Thirdly, we proposed a balanced classification tree in the threshold selection to balance the buckets’ data, when a large number of elements with the same value. Finally,we experimented with the performance of each optimization and the whole ParILUT. And verified the correctness of the optimized ParILUT. The result indicates that the optimized ParILUT average speedup is 4.03 times over the original version, and the speedup increases with the amount of fill-in.
Yang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: