Deep learning has advanced domains from autonomous driving to medical imaging, yet its adoption in healthcare comes with significant challenges. Clinical imaging depends on volumetric modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and Cone-beam CT (CBCT), which are high-dimensional, heterogeneous, and often affected by noise or acquisition artifacts. Generating annotated data is another major challenge, as voxel-level labels require specialized expertise and considerable time. Consequently, available datasets are usually small and imbalanced. Models trained in such settings tend to overfit and often fail to capture the long-range dependencies necessary to represent complex anatomical structures. These limitations motivate the central question of this thesis: how can we design data-efficient methods that improve performance in tasks such as classification and segmentation when annotated data is scarce?This thesis addresses these limitations at three levels: data, model, and application. At the data level, we propose Mixing OCSVM Negatives (MiOC), a novel contrastive pretraining framework. Standard contrastive learning relies heavily on the quality of negatives, yet randomly sampled negatives are often either too easy or semantically ambiguous. MiOC introduces one-class support vector machine (OCSVM) guided sampling to identify inlier negatives within a hypersphere around the query embedding and then mixes them with queries to generate synthetic hard negatives. This approach broadens the hard negative sample space beyond basic dot-product ranking, leading to richer and more discriminative representations. Experiments on multiple datasets (ImageNet-100, CIFAR-10, CIFAR-100, STL-10, CINIC-10) show consistent improvements in downstream classification, with MiOC outperforming state-of-the-art models by adding only a small set of synthetic negatives to the existing queue. At the model level, we develop Differential UMamba (Diff-UMamba), a segmentation architecture that integrates selective state-space mamba blocks with a noise reduction module. This module performs signal differencing in the encoder bottleneck, suppressing noise-like activations and highlighting clinically relevant features. By reducing overfitting and modeling long-range dependencies, Diff-UMamba achieves stronger generalization under limited data. Extensive evaluations on BRaTS21, MSD (lung and pancreas), AIIB23, and an internal non-small cell lung cancer (NSCLC) dataset show performance gains of 1–5% over state-of-the-art CNN, transformer, and mamba-based baselines. At the application level, we design a dedicated pipeline for gross tumor volume (GTV) segmentation in CBCT-guided adaptive radiotherapy, where tumors are difficult to distinguish due to low contrast and imaging artifacts. The proposed Diff-UMamba model, combined with differentiation-based refinement and mamba modules, demonstrates robust and accurate GTV delineation. Incorporating rigidly registered planning CT contours as priors further improves the Dice scores, surpassing both deformable registration methods and state-of-the-art deep learning baselines.Together, these contributions advance learning under data scarcity by improving representation quality, architectural robustness, and clinical applicability. This work demonstrates that carefully designed methods can mitigate the constraints of limited annotated data and enable more reliable deployment of deep learning in data-scarce scenarios.
Dhruv Jain (Mon,) studied this question.