March 3, 2026Open Access

Predicting aquatic toxicity of organic compounds using the ML-DL-ens model: An integrated approach of machine learning and deep learning

Key Points

The ML-DL-ens model enhances predictive accuracy for aquatic toxicity using multiple algorithms—achieving AUC-ROC values above 0.89.
Optimal performance was observed with integrated techniques, surpassing single and other ensemble models in toxicity prediction accuracy.
Utilizing particle swarm optimization allowed for improved model weights, enhancing feature learning and mechanistic insights.
SHAP value analysis provides insights into molecular substructures influencing toxicity, supporting better understanding of chemical impacts.

Abstract

Toxicity assessment plays a crucial role in protecting aquatic ecosystems. Organic pollutants in water bodies directly threaten the diversity of aquatic organisms. Rapid screening of toxicity can be achieved through calculation methods, providing support for the regulatory priority of hazardous compounds and facilitating further experimental verification work in the future. Although classic machine learning methods have shown potential in toxicity prediction, their limitations - including reliance on manual feature engineering, poor generalization ability in different chemical spaces, and sensitivity to data noise - have reduced their reliability in practical applications. To address these challenges, advanced models capable of automating feature learning, enhancing generalizability, and providing mechanistic insights are urgently needed to improve both predictive accuracy and interpretability in aquatic toxicity identification. In this study, we propose an integrated framework that combines four machine learning algorithms, namely K-Nearest Neighbors, Support Vector Machines, Extreme Gradient Boosting, and Random Forests, along with the AttentiveFP graphical neural network model, termed ML-DL-ens, for toxicity identification. The weights of the ML-DL-ens model were optimized using a particle swarm optimization (PSO) algorithm to enhance the accuracy of predictions regarding the aquatic toxicity of organic compounds. The results showed that our ML-DL-ens model demonstrated superior performance on multiple data sets, with AUC-ROC values of 0.8951, 0.9404, 0.8934, and 0.8871 in the 96 h LC50 set, 40 h IGC50 set, 48 h LC50-DM set, and Combined set, respectively, which achieved a state-of-the-art performance on toxicity prediction that is better than the performance of any single model and the other integrated methods. In addition, the SHAP value analysis and graphical representation ensure that it can provide insights into the key molecular substructures affecting toxicity predictions. Overall, the ML-DL-ens is a promising framework to improve prediction accuracy and toxicity identification.

Predicting aquatic toxicity of organic compounds using the ML-DL-ens model: An integrated approach of machine learning and deep learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider