Does integrating demographic features with molecular structures improve the prediction of drug-induced cardiotoxicity in machine learning models?
Integrating demographic factors with molecular structures in current machine learning models yields only modest predictive ability for drug-induced cardiotoxicity, highlighting the need for mechanistic reasoning or advanced modeling.
Drug-Induced Cardiotoxicity represents a significant challenge in drug development, accounting for numerous clinical trial failures and post-market withdrawals. Existing in silico models predict cardiotoxicity from molecular structure but typically overlook demographic factors, despite clinical evidence showing differences in adverse responses across sex, age, and weight groups. To address this gap, we developed CARBIDE (CARdiotoxicity Based on Integrated Demographic Evidence), a novel dataset derived from the FDA Adverse Event Reporting System (FAERS), and trained machine learning models integrating molecular structures with demographic features. Systematic evaluation of 18 dataset variants revealed that design choices-particularly FAERS entry inclusion criteria-substantially influenced model performance. However, models achieved only modest discriminative ability (ROC AUC: 58 ± 3%), primarily reflecting baseline cardiotoxicity rates within demographic subgroups rather than learning meaningful chemical-demographic relationships. These findings indicate a need for incorporating mechanistic reasoning or more advanced models capable of capturing these relationships.
Iwan et al. (Sun,) studied this question.