Computational Fluid Dynamics (CFD) on unstructured meshes are widely used to simulate complex flow problems. Geometric Multigrid (GMG) is an essential method to accelerate CFD simulations. However, achieving high efficiency and scalability for unstructured geometric multigrid CFD is very challenging, due to data conflicts or data dependencies at shared-memory level as well as communication bottlenecks at distributed-memory level. Traditional hybrid parallelization schemes, which typically employ domain decomposition for MPI at distributed-memory level and mesh coloring for OpenMP at shared-memory level, fail to scale on modern HPC architectures. This paper proposes an efficient and scalable hybrid parallelization scheme for unstructured geometric multigrid CFD. To begin with, we extend our previous work 1 , the Task Dependency Tree (TDT) approach 33, to expose shared-memory parallelism from unstructured mesh computations while respecting both data conflicts and data dependencies. We adapt TDT to handle multiple GMG levels with complex mesh boundaries. TDT can be implemented using a task-based programming model such as OpenMP task , which offers a unique opportunity to fine-grained computation-communication overlap in hybrid parallelization. Therefore, at distributed-memory level we introduce the one-sided asynchronous multithreaded Partitioned Global Address Space (PGAS) model, and develop a PGAS+TDT hybrid scheme. We propose an adaptive scheduling strategy to maximize the overlap of communication and computation tasks in our hybrid scheme. Our work was implemented and evaluated in a production-level unstructured CFD software on both x86 and ARM multi-core architectures. On a single compute node, TDT dramatically outperforms the prior shared-memory approaches, delivering a speedup of up to 5.2 ×. For large-scale tests, our PGAS+TDT hybrid scheme enhances performance by up to 2.0 × over the engineer-tuned MPI-only version, with a strong scalability of about 70% when scaling to 128 compute nodes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jianbin Fang
National University of Defense Technology
Qingsong Wang
National University of Defense Technology
Yonggang Che
National University of Defense Technology
ACM Transactions on Architecture and Code Optimization
National University of Defense Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Fang et al. (Fri,) studied this question.
synapsesocial.com/papers/692e3d846c9b3ab28c187341 — DOI: https://doi.org/10.1145/3776752
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: