Accurate prediction of the heat of combustion (HoC) is essential for fuel design and chemical safety assessment. In this work, we systematically evaluate four machine learning models─XGBoost, multilayer perceptron (MLP), graph convolutional network (GCN), and neural network convolution (NNConv)─for their ability to predict HoC values across a data set of 4516 compounds. Our results show that descriptor-based approaches (XGBoost and MLP) demonstrated faster training times (≈34–60 s) and lower final test errors (2.08 and 2.31 kJ/g, respectively), with MLP achieving the highest coefficient of determination (R2 = 0.942). In contrast, graph-based models (GCN and NNConv) required significantly longer runtimes (≈360–2700 s) but converged more rapidly per epoch, exhibited robust generalization with minimal overfitting, and produced stable error distributions. Residual and density analyses confirmed that NNConv yielded the most compact clustering around experimental values, reflecting high predictive fidelity. Importantly, when applied to GHS/CLP hazard classification with a 20 kJ/g threshold, all models reliably distinguished between flammable and nonflammable compounds, with model-specific ambiguity zones highlighting borderline cases. Analysis of prediction errors' influence on classification under the GHS/CLP flammability criterion shows that NNConv achieved the overall best performance, with minimal false negatives. These results underscore the trade-off between computational efficiency and representational richness in molecular property prediction, while demonstrating that both descriptor- and graph-based models can serve as effective high-throughput screening tools for regulatory applications.
Nnyigide et al. (Sun,) studied this question.