What does this research mean for the field?

The majority (60.9%) of published machine learning QSAR models for chemical hazard identification are currently non-usable for regulatory purposes, necessitating standardized frameworks to improve their accessibility, verifiability, and regulatory acceptance. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to evaluate the effectiveness of machine learning approaches in chemical hazard identification and their regulatory applicability.

March 28, 2026Open Access

Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment

Key Points

The aim is to evaluate the effectiveness of machine learning approaches in chemical hazard identification and their regulatory applicability.
Conducted a literature review of nearly 2300 articles on machine learning and new approach methodologies in chemical hazard assessment.
Focused on human health endpoints including genotoxicity, carcinogenicity, and various toxicities.
Categorized the usability of 274 publications with ML-QSAR models into non-usable, potentially usable, and directly usable.
60.9% of the ML-QSAR models identified were non-usable, while 21.9% were potentially usable, and 17.2% were directly usable.
Skin sensitization had the highest coverage with ML-QSAR models, followed by endocrine disruption and genotoxicity.
Tree-based models, particularly random forests, were the most common among usable models.

Abstract

Abstract Machine Learning (ML) and Artificial Intelligence (AI) approaches have potential to make better-informed decisions in chemical hazard identification while reducing animal testing. Their application in the context of New Approach Methodologies (NAMs) for Hazard Identification in Chemicals Risk Assessment (CRA) is challenging due to the limited knowledge, lack of experience, and uncertainty related to the use of these approaches. Therefore, to facilitate ML and AI approaches' potential acceptance for regulatory use, better standardization, guidelines for transparent reporting, validation, and frameworks are needed to understand their accessibility, verifiability, and usefulness criteria for predictions. An extensive literature review on the availability of ML and AI based NAMs for chemical hazard identification was conducted, focusing primarily on human health endpoints: specific target organ toxicity (STOT), genotoxicity and carcinogenicity, endocrine disruption, skin sensitization, developmental and reproductive toxicity (DART), and repeated dose or chronic toxicity. Nearly 2300 scientific articles were reviewed, and 274 publications with ML-QSAR models revealed that 60.9% of the models described in the scientific literature turned out to be non-usable, 21.9% were potentially usable, and 17.2% were directly usable, i.e., had available software solutions. By endpoint, the skin sensitization is best covered with the ML-QSAR models, followed by endocrine disruption, genotoxicity, and carcinogenicity models. The most derived ML-QSAR models are tree-based models such as random forests, and analogues, followed by artificial neural networks and support vector machine models, with other models being used to a lesser extent. The literature analysis led to a framework that helps model users to identify potentially suitable models for use in a regulatory context. In addition, the framework could help model developers better understand the expectations of model users in a regulatory context and use the framework as a reference when publishing their models, ensuring greater transparency, alignment with regulatory needs, and facilitating future acceptance.

Bookmark

View Full Paper

Bookmark

View Full Paper

Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment

Key Points

Abstract

Cite This Study