Abstract Learning-based image classification has become central to modern medical imaging, but the field is changing rapidly: foundation models, vision–language models (VLMs), and label-efficient pretraining are reshaping which methods are clinically useful. This review focuses on the state of the art rather than re-explaining well-established models. We summarize learning paradigms, contrast classical machine learning (ML) and deep learning (DL) families, and emphasize advances most relevant to clinical translation: medical foundation models, multimodal VLMs, hybrid CNN–transformer architectures, diffusion-based augmentation, self-supervised pretraining, federated learning, and efficient deployment. We also discuss modality-specific issues across X-ray, CT, MRI, PET/SPECT, ultrasound, OCT, endoscopy, microscopy, and optical/molecular/infrared imaging because model choice depends strongly on image structure, annotation cost, and workflow. Finally, we outline persistent clinical challenges, data diversity and bias, rare-condition detection, annotation noise, explainability, calibration, and equitable performance, and the methods that mitigate them. The aim is to provide biomedical engineers and clinicians with a compact, clinically grounded reference for selecting and validating AI-based classifiers for real medical workflows.
Nia et al. (Tue,) studied this question.