What question did this study set out to answer?

To develop machine learning models to predict significant liver fibrosis stages using demographic and clinical data.

March 1, 2026

Development and evaluation of machine learning models for predicting significant liver fibrosis stages: A retrospective analysis

Key Points

To develop machine learning models to predict significant liver fibrosis stages using demographic and clinical data.
Prospectively analyzed data from 4,193 patients in the Liver Beware study.
Collected demographic and clinical data prior to transient elastography (TE).
Split data into training (60%), validation (20%), and test (20%) sets.
Evaluated six machine learning algorithms, including XGBoost and random forest.
Assessed performance using accuracy, sensitivity, specificity, precision, and AUC.
XGBoost achieved 72.2% accuracy with 59.7% sensitivity and 73.4% specificity.
Random forest recorded the highest accuracy at 91.1% but had low sensitivity at 1.4%.
Key predictors of fibrosis risk identified were obesity, diabetes, and hypertension.
Machine learning models showed acceptable accuracy and can enhance patient triaging.

Abstract

Background: Advanced fibrosis (F2–F4) drives morbidity and mortality in metabolic dysfunction-associated steatotic liver disease (MASLD). Population-wide screening is impractical due to patient volume and health care costs. We hypothesized that machine learning (ML) algorithms trained on routine demographic and clinical data could identify patients at risk of significant fibrosis, reducing reliance on blood draws or transient elastography (TE). Methods: As part of the Liver Beware study, 4,193 patients prospectively underwent TE. Clinical and demographic data, such as age, BMI, race, diabetes, and hypertension, were collected immediately prior to elastography. Data were split into training (60%), validation (20%), and test (20%) sets. Six ML algorithms were evaluated: logistic regression, logistic regression with SMOTE, XGBoost, random forest, SVM, and ensemble voting classifier. Performance was assessed by accuracy, sensitivity, specificity, precision, and area under the curve (AUC). Results: XGBoost had the most well-balanced test performance with 72.2% accuracy, 59.7% sensitivity, 73.4% specificity, 17.4% precision, and AUC of 0.72. Random forest had the highest accuracy (91.1%) but low sensitivity (1.4%). XGBoost identified obesity, diabetes, and hypertension as the leading predictors of risk of fibrosis. Conclusions: ML algorithms based on readily available demographic and clinical data can identify patients at high risk of fibrosis with acceptable accuracy. This scalable approach enables triaging for further testing such as TE, trading marginal AUC reduction for maximal accessibility compared with biomarker-dependent scores (eg, SAFE, Agile 4/3+). Implementation and cost-effectiveness studies are needed to refine referral thresholds and evaluate real-world impact.

Bookmark

Development and evaluation of machine learning models for predicting significant liver fibrosis stages: A retrospective analysis

Key Points

Abstract

Cite This Study