Few-Shot and Zero-Shot Learning for MRI Brain Tumor Classification Using CLIP and Vision Transformers

Key Points

Few-shot learning achieved 85% accuracy, significantly outperforming fine-tuning approaches.
ResNet-18 backbone provided the best performance in brain tumor classification tasks.
Prototypical Network architecture was utilized to explore few-shot and zero-shot learning methods.
Results emphasize the need for effective classification strategies in scenarios with limited annotated data.

Abstract

Accurate classification of brain tumors from MRI scans remains challenging due to limited annotated data. This study compares data-efficient paradigms—few-shot learning (FSL) and zero-shot learning (ZSL)—for tumor diagnosis using deep learning and vision–language models. A Prototypical Network (ProtoNet) with CNN, ResNet-18, and vision transformer backbones was evaluated under 1000 randomly sampled five-shot, four-way episodes (mean ± SD). The ResNet-18 ProtoNet achieved 85% ± 8% accuracy (F1 = 0.85), surpassing a fine-tuned ResNet-50 baseline (42% ± 12%) and the CLIP (ZSL) model (30% ± 10%). A visual-only ZSL baseline without text guidance achieved 54% ± 11%. These results highlight that metric-based FSL offers 43% absolute improvement over standard fine-tuning and establishes a robust benchmark for data-efficient MRI classification under severe label constraints.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper