Abstract Background Liver cirrhosis (LC) is a chronic liver disease with global prevalence. Current diagnostic methods for LC still face limitations in safety and accessibility. We aimed to develop an interpretable machine learning (ML) prediction model for LC using gut microbes and deploy it as a web-based clinical decision support tool. Methods Data were retrieved from PubMed and BioProject databases. Bioinformatics re-analysis and discriminant analysis effect size (LEfSe) analysis was conducted to preliminarily identify key genera associated with LC. Further feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression. The independent datasets were combined to form an integrated dataset, which was then subjected to five-fold cross-validation and leave-one-dataset-out (LODO) analysis. Model performance was evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), and the optimal model was selected. The decision mechanism of the optimal model was interpreted using SHapley Additive exPlanations (SHAP), and the model was deployed as a web application using the Streamlit framework. Results We ultimately included 11 datasets related to LC. The genera Veillonella , Lachnospira , Romboutsia , Akkermansia , Erysipelatoclostridium , Prevotella , UCG.005 , and Streptococcus were identified as key predictors distinguishing LC patients from healthy controls. The Random Forest (RF) model demonstrated the best predictive performance (AUC in five-fold cross-validation: 0.875, 95% CI: 0.823–0.905; AUC in LODO analysis: 0.793, 95% CI: 0.702–0.940) and was deployed as an online LC prediction tool. Conclusion The interpretable RF model, along with its web-based implementation, has the potential to provide decision support for healthcare professionals and shows promise as a valuable auxiliary tool for LC screening and early clinical intervention.
Liu et al. (Wed,) studied this question.