Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a prevalent condition linked to type 2 diabetes and other metabolic risk factors. Timely detection of advanced fibrosis (≥F3) in MASLD patients is critical for effective clinical management. Traditional risk scores, such as the Fibrosis-4 Index (FIB-4) and NAFLD Fibrosis Score (NFS), have limitations, prompting the exploration of machine learning models for improved risk prediction. Objectives: This proof-of-concept study evaluates the feasibility of using large language models (LLMs), specifically GPT-4 and GPT-3.5 (OpenAI, Inc., San Francisco, United States), to predict advanced liver fibrosis in individuals with MASLD using only structured clinical variables from the National Health and Nutrition Examination Survey (NHANES). Methods: We used NHANES 2017-2020 data, including 162 participants with MASLD. GPT-4 and GPT-3.5 were accessed via application programming interface (API) to predict fibrosis risk using variables such as age, BMI, aspartate aminotransferase (AST), alanine aminotransferase (ALT), platelet count, and HbA1c. Performance was evaluated using sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), and Brier score, with model thresholds set at 40.5% for GPT-4 and 45% for GPT-3.5 based on Youden’s index. Results: GPT-4 achieved an AUROC of 0.91 (95% CI: 0.86-0.96), while GPT-3.5 demonstrated an AUROC of 0.90 (95% CI: 0.85-0.95). Both models showed strong calibration, with GPT-4 maintaining superior specificity (0.86 vs. 0.82). The models' performance outpaced traditional risk scores, such as FIB-4. Conclusions: GPT-based LLMs show strong potential for predicting advanced fibrosis in MASLD, offering a scalable, interpretable tool for clinical use. Further validation across diverse populations and clinical settings is needed to confirm generalizability and refine the approach before clinical adoption.
Njei et al. (Sun,) studied this question.