Machine learning (ML) has emerged as a powerful approach for accelerating the design of efficient organic solar cells (OSCs) by correlating molecular features with device performance. However, predictive modeling of power conversion efficiency (PCE) remains challenging due to the small size and heterogeneity of available experimental datasets. In this study, we propose a robust and data-efficient ensemble learning framework for accurately predicting PCE in donor-acceptor (D-A) molecular pairs. A dataset of 319 experimentally derived D-A combinations, containing key electronic and molecular descriptors, was employed to develop five regression models: Fine Tree (FT), Medium Tree (MT), Coarse Tree (CT), Bagged Tree (BGT), and Boosted Tree (BST). Among the models examined, the BST ensemble outperformed the others, achieving an R 2 of 88.75%, a minimum MAE of 0.522, and an RMSE of 0.725 for validation, as well as an R 2 of 85.26%, a minimum MAE of 0.549, and an RMSE of 0.734 for testing. The proposed framework integrates SMILES-derived molecular fingerprints with ensemble learning to capture complex, nonlinear interactions between donor and acceptor features, enabling the reliable estimation of efficiency even with limited data. This work highlights that data-driven ensemble approaches can serve as accurate and computationally economical methods for estimating the physical/chemical properties of OSC. The outcomes are expected to enable researchers to conduct rapid screening and develop next-generation OSC materials. • This study demonstrates the ML ensemble models to accurately predict the PCE of OSCs. • The BST ensemble model achieved the best accuracy, with an R 2 of 88.75% for validation and 85.26% for testing. • The outcomes help researchers accelerate the discovery of OSCs in the laboratory.
Kapil Dev Mahato (Wed,) studied this question.