Toxicity assessment plays a crucial role in protecting aquatic ecosystems. Organic pollutants in water bodies directly threaten the diversity of aquatic organisms. Rapid screening of toxicity can be achieved through calculation methods, providing support for the regulatory priority of hazardous compounds and facilitating further experimental verification work in the future. Although classic machine learning methods have shown potential in toxicity prediction, their limitations - including reliance on manual feature engineering, poor generalization ability in different chemical spaces, and sensitivity to data noise - have reduced their reliability in practical applications. To address these challenges, advanced models capable of automating feature learning, enhancing generalizability, and providing mechanistic insights are urgently needed to improve both predictive accuracy and interpretability in aquatic toxicity identification. In this study, we propose an integrated framework that combines four machine learning algorithms, namely K-Nearest Neighbors, Support Vector Machines, Extreme Gradient Boosting, and Random Forests, along with the AttentiveFP graphical neural network model, termed ML-DL-ens, for toxicity identification. The weights of the ML-DL-ens model were optimized using a particle swarm optimization (PSO) algorithm to enhance the accuracy of predictions regarding the aquatic toxicity of organic compounds. The results showed that our ML-DL-ens model demonstrated superior performance on multiple data sets, with AUC-ROC values of 0.8951, 0.9404, 0.8934, and 0.8871 in the 96 h LC50 set, 40 h IGC50 set, 48 h LC50-DM set, and Combined set, respectively, which achieved a state-of-the-art performance on toxicity prediction that is better than the performance of any single model and the other integrated methods. In addition, the SHAP value analysis and graphical representation ensure that it can provide insights into the key molecular substructures affecting toxicity predictions. Overall, the ML-DL-ens is a promising framework to improve prediction accuracy and toxicity identification.
Li et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: