March 3, 2026Open Access

Using Large Language Models to Predict Advanced Liver Fibrosis in Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD): A Proof-of-Concept Analysis

Key Points

GPT-4 predicted advanced liver fibrosis with an AUROC of 0.91, indicating high accuracy.
Sensitivity and specificity metrics were superior compared to traditional risk scores like the Fibrosis-4 Index.
Analysis utilized structured clinical variables from NHANES 2017-2020 data, focusing on 162 individuals with MASLD.
Findings highlight the potential of GPT-based models as scalable, interpretable tools for clinical application.

Abstract

Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a prevalent condition linked to type 2 diabetes and other metabolic risk factors. Timely detection of advanced fibrosis (≥F3) in MASLD patients is critical for effective clinical management. Traditional risk scores, such as the Fibrosis-4 Index (FIB-4) and NAFLD Fibrosis Score (NFS), have limitations, prompting the exploration of machine learning models for improved risk prediction. Objectives: This proof-of-concept study evaluates the feasibility of using large language models (LLMs), specifically GPT-4 and GPT-3.5 (OpenAI, Inc., San Francisco, United States), to predict advanced liver fibrosis in individuals with MASLD using only structured clinical variables from the National Health and Nutrition Examination Survey (NHANES). Methods: We used NHANES 2017-2020 data, including 162 participants with MASLD. GPT-4 and GPT-3.5 were accessed via application programming interface (API) to predict fibrosis risk using variables such as age, BMI, aspartate aminotransferase (AST), alanine aminotransferase (ALT), platelet count, and HbA1c. Performance was evaluated using sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), and Brier score, with model thresholds set at 40.5% for GPT-4 and 45% for GPT-3.5 based on Youden’s index. Results: GPT-4 achieved an AUROC of 0.91 (95% CI: 0.86-0.96), while GPT-3.5 demonstrated an AUROC of 0.90 (95% CI: 0.85-0.95). Both models showed strong calibration, with GPT-4 maintaining superior specificity (0.86 vs. 0.82). The models' performance outpaced traditional risk scores, such as FIB-4. Conclusions: GPT-based LLMs show strong potential for predicting advanced fibrosis in MASLD, offering a scalable, interpretable tool for clinical use. Further validation across diverse populations and clinical settings is needed to confirm generalizability and refine the approach before clinical adoption.

Bookmark

View Full Paper

Bookmark

View Full Paper

Using Large Language Models to Predict Advanced Liver Fibrosis in Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD): A Proof-of-Concept Analysis

Key Points

Abstract

Cite This Study