This study proposes a five-dimensional framework for evaluating and governing medical large language models across mathematics, philosophy, humanistic ethics, medical education and assessment, and technological ontology. In contrast to mainstream evaluations that overemphasize exam-style accuracy, the framework extends “what a model can get right” to include “why it is right, under which boundary conditions it holds, for whom it is more likely to fail, and how it can be credibly integrated into clinical and educational systems.” When deployed in real clinical settings, this framework operationalizes robustness, fairness, safety, transparency, and clinician-centered usability. We recommend mapping concrete metrics to workflow tasks and integrating humanistic and moral safeguards. This study also offers an ontological reflection to avoid anthropomorphizing “quasi-life,” while preserving human primacy in decision-making. Overall, this interdisciplinary approach complements recent evaluations of medical large language models and provides practical guidance for certification, assessment, and education, as artificial intelligence becomes deeply embedded in health care.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haitao Zhang
Ying Liu
Shanghai East Hospital
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69b2582a96eeacc4fcec7772 — DOI: https://doi.org/10.1097/hd9.0000000000000015