Large-scale atmospheric dispersion model for emergency response to nuclear accidents requires high computational efficiency and numerical reliability. A GPU-oriented Lagrangian particle dispersion model was developed within FLEXPART framework to address these demands. Core transport processes—including advection, turbulent diffusion, convective mixing, and dry/wet deposition—were restructured for GPU parallel execution. Further incorporation of fast arithmetic operators and multi-level parallelization strategies substantially improved overall computational performance while preserving physical accuracy. Additional MPI-based parallel meteorological data decoupling and preprocessing tool has been developed, which alleviates data-handling bottlenecks. Meanwhile, multi-GPU execution and a load-balancing strategy enable efficient scaling in heterogeneous computing environments. Using the first release of European Tracer Experiment (ETEX-I) as a benchmark, the GPU program’s accuracy and acceleration were rigorously evaluated. Results show that, while maintaining nearly comparable accuracy (with relative errors on the order of 10−2), the program achieves an overall speedup of approximately 40.45 on a single-GPU platform, which can be further increased to about 52.05 in high-performance application scenarios where meteorological background fields are reusable. Moreover, multi-GPU experiments reveal favorable parallel scalability across configurations ranging from one to four GPUs, and confirm that the proposed load-balancing strategy effectively enhances computational efficiency in heterogeneous GPU environments.
Li et al. (Mon,) studied this question.