Abstract Computational design of high-efficiency organic photovoltaics requires clear links between three-dimensional active-layer morphology and device performance. We present a data-driven workflow that first uses coreset selection to distill a large library of simulated morphologies into a small, representative subset, thereby focusing expensive morphology-aware exciton drift–diffusion simulations where they matter most. Using two device performance metrics, short-circuit current density, Jₒ₂ J SC, and fill factor, FF, from these simulations, we then apply feature-selection strategies to identify a handful of interpretable morphological descriptors that accurately predict both quantities. Sample-size ablations show that model accuracy, and the identity and rankings of the selected descriptors, remain stable with as few as 50 samples with the device performance. The descriptors most predictive of Jₒ₂ J SC differ from those for FF, reflecting distinct morphological bases for these two performance metrics. Moreover, cross-system comparisons (P3HT: PCBM vs. PM6: Y6) reveal shifts in the most influential descriptors, indicating that morphology–performance relationships are material-specific. Graphical abstract
Saadati et al. (Mon,) studied this question.