Introduction: A major barrier to actualizing precision medical education is performing the ongoing, continuous analysis necessary for assessment and iterative feedback to improve foundational knowledge and diagnostic reasoning.We are leveraging large language models (LLMs) in this pilot project to analyze nephrology fellow clinical documentation and map their diagnostic exposures to topics relevant to the practice of nephrology with the goal of providing subsequent targeted educational interventions based on each individual learner's needs.Methods: 50 nephrology fellow hyponatremia clinical encounters (47 inpatient and 3 outpatient) at a large academic medical center were extracted into a HIPAA compliant secure computing environment.These encounters were analyzed by two expert reviewers and by pretrained LLMs including MedGemma, Qwen2.5, and LLaMA3.We determined the underlying hyponatremia diagnoses present and mapped them to the ABIM nephrology blueprint.We evaluated clinical reasoning utilizing a validated tool (R-IDEA).Expert reviewer results were used as the "gold standard" and compared to LLM output to evaluate LLM performance.Cohen's kappa for inter-rater agreement was determined for hyponatremia diagnoses and Spearman correlation and Pearson correlation were determined for each R-IDEA clinical reasoning category.Results: Expert reviewers identified SIADH (11), hypervolemic hyponatremia ( 17), low solute intake (4), hyponatremia due to thiazide diuretic use (3), hypertonic hyponatremia (2), pseudohyponatremia (1), and hypotonic hyponatremia due to other causes (24) after manual review.LLM performance varied by model and across hyponatremia diagnoses.We found that Qwen2.5 performed best at this stage.Interrater reliability between expert reviewers and Qwen2.5 was moderate (Cohen's k 0.56).Correct identification by the LLM occurred most frequently for SIADH and least frequently for hypotonic hyponatremia due to thiazide diuretic use.We found weak agreement at this stage between LLM R-IDEA score and expert reviewers.Spearman correlation for total R-IDEA score was 0.361 and Pearson correlation was 0.320.Conclusion: This innovative use of LLMs is an initial proof of concept project that strives to improve nephrology fellow education via analysis of learner's real-world documentation with plans for subsequent targeted educational interventions to meet learners needs and improve clinical reasoning.We have demonstrated modest agreement between expert reviewers and readily available LLMs regarding hyponatremia diagnoses present and weak agreement when evaluating learner clinical reasoning.Continued efforts to optimize model performance are underway.Subsequent piloting of delivery of targeted educational interventions for learners based on real time evaluation of this data and scaling this system throughout the nephrology curriculum are planned next steps to enable continuous individualized learning throughout nephrology fellowship that is tailored to a specific fellow's needs.I have no potential conflict of interest to disclose.I did not use generative AI and AI-assisted technologies in the writing process.
Thorne et al. (Wed,) studied this question.