ABSTRACT Background Large language models (LLMs) can support clinical decision‐making by parsing databases and extracting relevant information. However, evaluating drug‐induced liver injury (DILI) often requires processing lengthy clinical histories alongside reference materials like LiverTox, which can exceed context lengths of conventional LLMs. Challenges such as information truncation hinder standard approaches like prompt engineering and retrieval‐augmented generation (RAG). To address these limitations, this study introduces DILIConsult, an agentic LLM pipeline based on GPT‐4, designed to intelligently parse clinical and drug information. Methods To develop DILIConsult, we compared GPT‐4‐Turbo versus GPT‐4o for extracting DILI characteristics from LiverTox descriptions. We tested two approaches to analyzing cases of suspected DILI: full‐length case analysis versus sequential drug‐specific evaluations. We evaluated DILIConsult on cases of suspected DILI identified from the open source Medical Information Mart for Intensive Care‐IV (MIMIC‐IV) ICU dataset based on American Association for the Study of Liver Diseases (AASLD) and European Association for the Study of the Liver (EASL) criteria. Outputs from DILIConsult were compared against a panel of clinicians comprising an ICU pharmacist, an ICU junior attending physician, and an ICU resident. Responses were evaluated by two senior ICU attending physicians. Results Using GPT‐4o and a sequential approach demonstrated improved performance in the extraction of DILI characteristics and analysis of suspected DILI. DILIConsult achieved the best mean rank of 1.66 ± 0.75 in knowledge recall and ranked second for reasoning (2.00 ± 0.64) and reflection of current medical consensus (2.05 ± 0.62). DILIConsult ranked last with mean ranks of 2.07 ± 0.52 and 2.09 ± 0.72 for less omission of important information and content inaccuracy, respectively. Conclusion DILIConsult demonstrates the potential of LLMs to assist clinicians in evaluating DILI. The findings emphasize the importance of task division in LLM‐driven workflows to minimize information loss.
Ho et al. (Fri,) studied this question.