Artificial intelligence (AI) in medicine is advancing steadily toward real clinical practice, not only through improved predictive performance but also through more decision-relevant modeling, evaluation, and interaction. The studies highlighted here illustrate three complementary directions. First, contemporary imaging models—exemplified by 3D vision transformer-based analysis of preoperative chest computed tomography (CT)—are being used to infer clinically consequential phenotypes, advancing from image recognition toward decision support in high-stakes settings such as surgical planning. Second, image-to-biomarker pipelines such as automated quantification of retinal vascular fractal dimension demonstrate how medical images can be transformed into reproducible quantitative markers suitable for population-level analysis and risk stratification. Third, large language models (LLMs) are increasingly evaluated and positioned as clinical communication and interpretation components: structured assessments in telepharmacy and bilingual patient education move beyond fluency to safety, actionability, empathy, and readability, while emerging perspectives consider LLMs as interpretive interfaces for complex, temporally evolving health data, including wearable sensing. At the same time, these advances also expose persistent bottlenecks that limit real-world deployment: the continued dominance of single-task and static formulations, fragmented systems driven by task-specific fine-tuning, limited reasoning over disease trajectories and evolving clinical contexts, and evaluation practices that remain insufficiently coupled to clinical workflows and downstream consequences. We argue that the next stage of medical AI should shift from accuracy-centered prediction toward decision-oriented, practice-ready systems that are robust across settings, clinically aligned in evaluation, and deployable at scale in routine care.
Chen et al. (Sun,) studied this question.