Understanding the translocation of organic contaminants in crops is vital for food safety and human health. This study developed machine learning (ML) models to predict root-to-stem translocation factors (TF) and identify molecular substructures influencing contaminant mobility. A dataset of 225 measurements covering 120 pesticides, 50 pharmaceuticals, and 55 PFAS across multiple crop species was used to train gradient boosting regression tree (GBRT) and fully connected neural network (FCNN) models. Using extended connectivity fingerprints (ECFP) instead of molecular weight and logKow improved predictive accuracy (R2 = 0.68-0.70 vs. 0.43-0.67), demonstrating the advantage of structure-based descriptors, and the superior R2 of ECFP highlighted its ability to capture complex structure-transport relationships. Mean absolute errors (MAE) were comparable (0.44-0.45 vs. 0.43-0.46), indicating partial redundancy between descriptors and fingerprints. Permutation feature importance (PFI) analysis identified key substructures affecting TF, including pyrazole rings, tetrasubstituted carbon, quaternary ammonium cations, and carbonyl and ether groups, reflecting the joint effects of hydrophobicity and structural complexity on molecular mobility. Model applicability to mature crops was evaluated using Mahalanobis distance, confirming reliable extrapolation across growth stages. External validation with independent datasets verified consistent predictive accuracy across diverse species and contaminants. The results bridge molecular structure with environmental fate and provide a quantitative framework for assessing contaminant transport in crops. The developed models support the design of low-mobility agrochemicals, identification of high-risk pollutants, and improved food safety management.
Lang et al. (Sun,) studied this question.