Key points are not available for this paper at this time.
Abstract The landscape of GPU-based scientific computing platforms is rapidly changing and thus is making a strong case for fully native GPU-based industrial CFD solvers. One of key strengths of native GPU-based CFD solvers is their ability to accelerate the solution of large transient problems. In turbomachinery, there are certain classes of problems which necessitate a full wheel, transient analysis, such as inlet distortion, rotating stall, and hot streak migration. Solving these problems with current CPU based hardware can be time consuming and, in general, very resource intensive. Furthermore, the goal of compressing design cycle times requires fast turnaround times for both steady-state and unsteady CFD analyses, for example, computing an entire compressor map. This paper introduces a new native multi-GPU solver framework designed to fully explore the potential of rapidly evolving GPU hardware. The solver has been developed with the goal of achieving the highest possible parallel scalability and speed on modern GPU high performance computing clusters. The new solver is based on a flexible architecture which allows its compilation and efficient use on a variety of GPU/CPU hardware platforms. In addition to fast execution of CFD solver code, they typically provide an electrical power savings of around a factor of four while reducing hardware costs by a factor of seven when compared to a CPU cluster with 1024 Intel® Xeon® Gold 6242 cores and a 6 x NVIDIA® V100 GPU server. This paper illustrates the use of a recently developed native multi-GPU solver for two selected full wheel transient turbomachinery simulations. These cases are: (1) a 10MW wind turbine running at a fixed speed, and (2) a high pressure, multistage axial compressor. Comparisons are made with a CPU-based parallel solver which has been well-validated for its accuracy for a wide range of turbomachinery and related flows. This includes evaluating result parity between solvers, accuracy of performance against available data, effective speedup, and scalability of the numerical solutions.
Patil et al. (Mon,) studied this question.