Understanding hydroxyl radical (HO·) reactivity with organic pollutants is crucial for optimizing advanced oxidation processes in water purification. Machine learning (ML) models have been developed to predict HO· reactivity but often produce black-box results due to complex chemical interplay. Herein, we constructed an interpretable ML framework to unveil the intrinsic molecular factors governing the HO· reactivity with antibiotic contaminants. A comprehensive set of DFT-derived constitutional, quantum-chemical, and Abraham descriptors was first employed to characterize the intricate structural and electronic nature of antibiotics. An attention-driven feature interaction method then engineered feature representations to generate the optimal subset comprising initial and new features. SHapley Additive exPlanations (SHAP) analysis quantified the interpretable contribution of individual features for model output, revealing key molecular properties, such as volume-regulated electronic migration ability (VVIP) and electronic attraction ability mediated by the number of halogen atoms (#XME). Finally, a causal interface ML model was developed to identify cause-and-effect relationships between target variables and intrinsic properties even within small data sets. The optimized random forest model demonstrated high accuracy in predicting HO· reactivity, with experimental validation showing relative errors of below 6%. This work establishes an applicable and robust causal discovery framework for enabling the more rational design of water purification strategies.
Zou et al. (Sun,) studied this question.