Sustainable energy systems such as anaerobic digestion (AD) bioreactors exhibit complex nonlinear dynamics that complicate the monitoring of key stability indicators using conventional laboratory-based methods. As a preliminary investigation, this pilot study explores the feasibility of using machine learning-based soft sensing to estimate Total Volatile Fatty Acids (TVFA(M)) from routinely measured physicochemical parameters. Using a short-term laboratory dataset obtained from controlled CO2 biomethanisation experiments, several regression models were benchmarked, including an attention-based deep learning architecture (TabNet), multi-architecture artificial neural networks (ANNs), gradient-boosting ensembles (CatBoost, XGBoost, LightGBM), and classical kernel-based approaches. Model performance was evaluated under a cross-validated framework to assess predictive capability and consistency across folds within the limited experimental scope. Among the tested models, TabNet achieved highly competitive performance, yielding an R2 of 0.8551, an RMSE of 0.0090, and an MAE of 0.0067. To support model transparency and interpretability, Explainable Artificial Intelligence (XAI) techniques based on SHapley Additive exPlanations (SHAP) were applied, identifying pCO2 as the dominant contributor to TVFA(M) predictions within the studied operational range. The results demonstrate the potential of explainable machine learning models as soft sensors for TVFA(M) estimation under controlled laboratory conditions. Although restricted to controlled laboratory conditions and a short observation period, this pilot study demonstrates the potential of explainable machine learning models for TVFA(M) estimation and provides a methodological benchmark for future validation using larger and more diverse datasets.
Amangeldy et al. (Sun,) studied this question.