What question did this study set out to answer?

This research aims to develop a machine learning prediction model for liver cirrhosis using gut microbiome data.

May 8, 2026Open Access

Machine learning-driven clinical decision support for liver cirrhosis: a gut microbiome-based web prediction model with explainable AI integration

Puntos clave

This research aims to develop a machine learning prediction model for liver cirrhosis using gut microbiome data.
Data retrieved from PubMed and BioProject databases.
Key genera were identified using LEfSe analysis and further refined with LASSO regression.
Model performance evaluated through five-fold cross-validation and leave-one-dataset-out (LODO) analysis.
11 datasets related to liver cirrhosis were included in the analysis.
Random Forest model achieved an AUC of 0.875 in five-fold cross-validation (95% CI: 0.823–0.905).
The model deployed as an online tool, showing potential for clinical decision support.

Resumen

Abstract Background Liver cirrhosis (LC) is a chronic liver disease with global prevalence. Current diagnostic methods for LC still face limitations in safety and accessibility. We aimed to develop an interpretable machine learning (ML) prediction model for LC using gut microbes and deploy it as a web-based clinical decision support tool. Methods Data were retrieved from PubMed and BioProject databases. Bioinformatics re-analysis and discriminant analysis effect size (LEfSe) analysis was conducted to preliminarily identify key genera associated with LC. Further feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression. The independent datasets were combined to form an integrated dataset, which was then subjected to five-fold cross-validation and leave-one-dataset-out (LODO) analysis. Model performance was evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), and the optimal model was selected. The decision mechanism of the optimal model was interpreted using SHapley Additive exPlanations (SHAP), and the model was deployed as a web application using the Streamlit framework. Results We ultimately included 11 datasets related to LC. The genera Veillonella , Lachnospira , Romboutsia , Akkermansia , Erysipelatoclostridium , Prevotella , UCG.005 , and Streptococcus were identified as key predictors distinguishing LC patients from healthy controls. The Random Forest (RF) model demonstrated the best predictive performance (AUC in five-fold cross-validation: 0.875, 95% CI: 0.823–0.905; AUC in LODO analysis: 0.793, 95% CI: 0.702–0.940) and was deployed as an online LC prediction tool. Conclusion The interpretable RF model, along with its web-based implementation, has the potential to provide decision support for healthcare professionals and shows promise as a valuable auxiliary tool for LC screening and early clinical intervention.

Me gusta

Guardar

Ver artículo completo