The development of hypersonic vehicles presents severe challenges to computational fluid dynamics (CFD) simulation efficiency, particularly for unstructured meshes where traditional central processing unit (CPU) architectures lack scalability and graphics processing unit (GPU) implementations require further optimization. This paper constructs a multi-level parallel acceleration framework for a three-dimensional unstructured solver targeting heterogeneous architectures. Profiling reveals the intrinsic constraint of memory access on non-independent parallel kernel functions. We enhance memory efficiency across multiple dimensions, including data layout reconstruction, mesh reordering, and kernel fusion. A decoupled reordering strategy partitioning the domain into inner-halo-padding regions is designed to enable overlap of multi-GPU computation and communication while preserving data locality. Benefiting from the generality of these optimizations, the framework is easily portable to other heterogeneous platforms like deep computing units (DCUs). Tests demonstrate speedups of approximately 1600x on GPU and over 500x on DCU compared to CPU implementations, enabling efficient simulations reaching hundred-million-cell scale with excellent scalability and cross-platform capability. The proposed framework offers a reusable paradigm for optimizing high-performance unstructured CFD software, enhancing hypersonic aerodynamic assessment efficiency.
Zhang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: