Background: Advanced fibrosis (F2–F4) drives morbidity and mortality in metabolic dysfunction-associated steatotic liver disease (MASLD). Population-wide screening is impractical due to patient volume and health care costs. We hypothesized that machine learning (ML) algorithms trained on routine demographic and clinical data could identify patients at risk of significant fibrosis, reducing reliance on blood draws or transient elastography (TE). Methods: As part of the Liver Beware study, 4,193 patients prospectively underwent TE. Clinical and demographic data, such as age, BMI, race, diabetes, and hypertension, were collected immediately prior to elastography. Data were split into training (60%), validation (20%), and test (20%) sets. Six ML algorithms were evaluated: logistic regression, logistic regression with SMOTE, XGBoost, random forest, SVM, and ensemble voting classifier. Performance was assessed by accuracy, sensitivity, specificity, precision, and area under the curve (AUC). Results: XGBoost had the most well-balanced test performance with 72.2% accuracy, 59.7% sensitivity, 73.4% specificity, 17.4% precision, and AUC of 0.72. Random forest had the highest accuracy (91.1%) but low sensitivity (1.4%). XGBoost identified obesity, diabetes, and hypertension as the leading predictors of risk of fibrosis. Conclusions: ML algorithms based on readily available demographic and clinical data can identify patients at high risk of fibrosis with acceptable accuracy. This scalable approach enables triaging for further testing such as TE, trading marginal AUC reduction for maximal accessibility compared with biomarker-dependent scores (eg, SAFE, Agile 4/3+). Implementation and cost-effectiveness studies are needed to refine referral thresholds and evaluate real-world impact.
Tjandra et al. (Sun,) studied this question.