Abstract This study introduces a robust machine learning framework for predicting hydrochar yield and higher heating value (HHV) using biomass proximate analysis. A curated dataset of 481 samples was assembled, featuring input variables such as fixed carbon, volatile matter, ash content, reaction time, temperature, and water content. Hydrochar yield and HHV served as the target outputs. To enhance data quality, Monte Carlo Outlier Detection (MCOD) was employed to eliminate anomalous entries. Thirteen machine learning algorithms, including convolutional neural networks (CNN), linear regression, decision trees, and advanced ensemble methods (CatBoost, LightGBM, XGBoost) were systematically compared. CatBoost demonstrated superior performance, achieving an R 2 of 0.98 and mean squared error (MSE) of 0.05 for HHV prediction, and an R 2 of 0.94 with MSE of 0.03 for yield estimation. SHAP analysis identified ash content as the most influential feature for HHV prediction, while temperature, water content, and fixed carbon were key drivers of yield. These results validate the effectiveness of gradient boosting models, particularly CatBoost, in accurately modeling hydrothermal carbonization outcomes and supporting data-driven biomass valorization strategies. Graphical abstract
Building similarity graph...
Analyzing shared references across papers
Loading...
Guoliang Hou
Ahmad Alkhayyat
Ahmad Almalkawi
Bioresources and Bioprocessing
Saveetha University
Chitkara University
Jain University
Building similarity graph...
Analyzing shared references across papers
Loading...
Hou et al. (Wed,) studied this question.
www.synapsesocial.com/papers/693231288e51979591dce545 — DOI: https://doi.org/10.1186/s40643-025-00979-1