What question did this study set out to answer?

The aim is to compare deep learning graph neural networks with classical machine learning models for predicting heat of combustion.

March 18, 2026Open Access

Comparative Evaluation of Deep Learning Graph Neural Networks and Classical Machine Learning Models for Predicting Heats of Combustion as a Key Hazard Indicator under GHS/CLP Standards

Key Points

The aim is to compare deep learning graph neural networks with classical machine learning models for predicting heat of combustion.
Evaluated four models: XGBoost, multilayer perceptron, graph convolutional network, and neural network convolution.
Used a data set of 4516 compounds for prediction.
Assessed training times, test errors, and predictive fidelity for each model.
Descriptor-based models (XGBoost and MLP) had faster training times and lower errors than graph-based models.
MLP achieved the highest R2 value of 0.942.
NNConv showed the best performance in predicting heat of combustion with minimal false negatives.

Abstract

Accurate prediction of the heat of combustion (HoC) is essential for fuel design and chemical safety assessment. In this work, we systematically evaluate four machine learning models─XGBoost, multilayer perceptron (MLP), graph convolutional network (GCN), and neural network convolution (NNConv)─for their ability to predict HoC values across a data set of 4516 compounds. Our results show that descriptor-based approaches (XGBoost and MLP) demonstrated faster training times (≈34–60 s) and lower final test errors (2.08 and 2.31 kJ/g, respectively), with MLP achieving the highest coefficient of determination (R2 = 0.942). In contrast, graph-based models (GCN and NNConv) required significantly longer runtimes (≈360–2700 s) but converged more rapidly per epoch, exhibited robust generalization with minimal overfitting, and produced stable error distributions. Residual and density analyses confirmed that NNConv yielded the most compact clustering around experimental values, reflecting high predictive fidelity. Importantly, when applied to GHS/CLP hazard classification with a 20 kJ/g threshold, all models reliably distinguished between flammable and nonflammable compounds, with model-specific ambiguity zones highlighting borderline cases. Analysis of prediction errors' influence on classification under the GHS/CLP flammability criterion shows that NNConv achieved the overall best performance, with minimal false negatives. These results underscore the trade-off between computational efficiency and representational richness in molecular property prediction, while demonstrating that both descriptor- and graph-based models can serve as effective high-throughput screening tools for regulatory applications.

Comparative Evaluation of Deep Learning Graph Neural Networks and Classical Machine Learning Models for Predicting Heats of Combustion as a Key Hazard Indicator under GHS/CLP Standards

Key Points

Abstract

Cite This Study