Earth system models (ESMs) are important tools to project climate change, yet continue to have persistent systematic errors due to the representation of subgrid-scale processes, most notably atmospheric convection—a key driver of large-scale circulations such as the Hadley and Walker cells, as well as weather patterns such as thunderstorms. These errors contribute considerably to uncertainties in climate projections. Traditional convection parameterizations rely on physical assumptions and empirically derived relationships that fail to capture the full complexity of convective processes. This thesis contributes to showing that machine learning (ML) offers breakthroughs, leveraging high-fidelity simulations to learn data-driven parameterizations that better represent subgrid-scale dynamics. However, translating high offline ML performance into stable, physically consistent, and transferable online implementations in ESMs has proven challenging, often due to issues related to causality, scale separation, distributional shifts, and process separation. This dissertation addresses these challenges through two complementary studies that advance the development, interpretation, and integration of ML-based convection parameterizations into the ICOsahedral Nonhydrostatic (ICON) model. The first study develops and benchmarks a suite of ML models, including deep learning and tree-based methods, trained on filtered and coarse-grained convective fluxes derived from storm-resolving ICON simulations over the tropical Atlantic. A filtering method to isolate convective contributions from other physical processes is meant to ensure that the ML models learn to represent deep convection. Offline, a U-Net architecture outperforms other models but exhibits non-causal dependencies on precipitating tracers, as revealed by explainable artificial intelligence (AI) analysis using SHapley Additive exPlanations (SHAP). Ablating these inputs yields a more physically interpretable and causally sound parameterization that demonstrates improvements in online stability, maintaining 180-day integrations in ICON while reducing biases in precipitation extremes compared to conventional schemes. However, a significant smoothing bias in the column water vapor distribution as well as biases in the mean temperature persist. Building on these insights, the second study presents a proof-of-concept for cross-model transferability and long-term integrability. A bidirectional long short-term memory model trained on the global ClimSim dataset, derived from superparameterized Energy Exascale Earth System Model-Multiscale Modeling Framework (E3SM-MMF) simulations, is transferred to the ICON-A atmosphere model. Several innovations ensure physical consistency, accuracy, and robustness: removal of radiative tendencies to isolate the convective signal, physics-informed and vertical consistency losses, confidence-guided mixing with a conventional scheme, and additive input noise during training to enhance extrapolation and stability. This hybrid AI-physics approach enables the stable multi-decadal (20-year) integration of an ML-based convection parameterization in ICON. Evaluating the resulting simulations against observations indicates improved precipitation statistics, including reduced root mean square error in the zonal mean, better spatial distribution, and improved precipitation extremes, as well as a more accurate spatial distribution of the near-surface temperature, relative to the reference ICON configuration. Furthermore, the developed scheme exhibits a physically interpretable regime behavior across column water vapor and stability metrics. However, the used training dataset has biases as well, e.g., the zonal mean precipitation does not match the observed climatology and conservation laws are not strictly enforced by the developed framework and would require a refined training dataset to do so. Moreover, the scheme is trained and evaluated at a relatively coarse horizontal resolution of ∼160 km; implementing it in the currently developed version of the hybrid ICON model, which has a horizontal resolution of ∼80 km, may require vertical interpolation and tuning. Together, these studies demonstrate that ML-based parameterizations can become stable, interpretable, and transferable components of next-generation climate models. They highlight the critical importance of physical consistency, learning causal relationships, and robust training practices in achieving reliable long-term simulations, thereby demonstrating that multi-decadal stable hybrid simulations are achievable, paving the way toward more accurate and trustworthy climate projections.
H Heuer (Fri,) studied this question.