What question did this study set out to answer?

The aim is to develop adaptive AI methods that maintain performance under distribution shifts in vision-language models and medical imaging without access to labeled target data.

February 12, 2026Open Access

Adaptive AI under distribution shifts: Methods for vision–language models and medical imaging

Puntos clave

The aim is to develop adaptive AI methods that maintain performance under distribution shifts in vision-language models and medical imaging without access to labeled target data.
Explores test-time adaptation techniques for vision-language models and histopathology image segmentation.
Introduces a feature-statistics alignment framework and the Gradient-to-Parameter ratio for model adaptation without source data access.
Develops Spectrum-Aware Test-Time Steering for lightweight adaptation of vision-language models in latent space.
Demonstrates consistent improvements in segmentation tasks under cross-domain shifts.
Shows enhanced zero-shot generalization in vision-language models while minimizing memory usage and computational overhead.
Validates adaptability across various tasks and architectures, reinforcing the importance of structured transformations under shifting conditions.

Resumen

Modern deep learning systems have achieved strong performance across vision, language, and medical imaging tasks; however, their effectiveness often deteriorates when deployed under distribution shifts, where test-time data differ from the conditions observed during training. Such shifts are ubiquitous in real-world settings, arising from changes in acquisition pipelines and protocols, environments, sensors, populations, modalities, or semantic context. These mismatches violate the independently and identically distributed (IID) assumption that underlies much of supervised learning and can substantially degrade predictive performance, including in high-stakes applications such as medical imaging and in large pre-trained foundation models. In many of these settings, retraining is costly and access to source data is frequently impractical or prohibited due to privacy regulations, institutional policies, or contractual restrictions. This dissertation studies adaptive artificial intelligence methods for improved generalization under distribution shifts, with a focus on test-time adaptation (TTA) techniques that operate without labeled target data and under practical deployment constraints such as restricted access to training data, tight inference-time latency budgets, and limited deployment-time memory footprint. Motivated by recent progress in multimodal foundation models and by the sensitivity of clinical pipelines to domain shifts, the scope of this dissertation centers on vision-language models and medical imaging systems, where adaptation must remain effective while being feasible to apply at deployment time. The dissertation consists of two parts. In the first part, distribution shifts in medical imaging are addressed, with an emphasis on histopathology image segmentation, where variability in tissue types, staining protocols, scanners, and acquisition centers leads to substantial domain mismatch. In such settings, access to source data during deployment is often restricted by privacy requirements, compliance considerations, or data-sharing limitations, which can limit the practicality of conventional domain adaptation approaches. A model- and task-aware test-time adaptation framework is presented based on representation-level alignment. Rather than relying solely on objectives defined in the model’s output space, distribution shift is treated as a mismatch between internal feature distributions, and adaptation is formulated as feature-statistics alignment between a pretrained source model and the target distribution. Specifically, a test-time Feature Alignment framework guided by the Gradient-to-Parameter ratio (G2P-FA) is introduced to adapt pretrained segmentation models to unlabeled target data without requiring access to the source dataset. Feature statistics are aligned between source and target domains across selected layers of the network, while layer-wise contributions are dynamically weighted using the ratio of gradient norm to parameter norm, thereby prioritizing layers that exhibit a stronger adaptation signal relative to their parameter scale. The method is designed for source-data-restricted settings by requiring stored source statistics rather than access to the original training dataset during the adaptation phase. Evaluations on histopathology nuclei segmentation under cross-domain shifts, including multi-task settings supporting both semantic and instance segmentation, demonstrate consistent improvements across multiple benchmarks and tissue types, indicating stable behavior under test-time adaptation. Beyond histopathology, the formulation is applicable across common architectures, including convolutional and transformer-based models, and across tasks such as classification, detection, and segmentation, since the alignment objective operates on intermediate representations rather than on task-specific output layers. The second part of the dissertation focuses on vision-language models (VLMs), which enable zero-shot inference by aligning visual and textual representations in a shared embedding space. Despite strong zero-shot capabilities, VLMs such as CLIP can exhibit notable performance degradation under test-time distribution shifts. Existing adaptation strategies for these large models often emphasize parameter-efficient updates, yet many test-time methods still rely on backpropagation through large frozen encoders to compute gradients for prompt parameters, adapter modules, or low-rank updates, which can increase test-time memory usage and latency, particularly in instance-level adaptation regimes. To address these constraints, this dissertation introduces Spectrum-Aware Test-Time Steering (STS), a lightweight test-time adaptation framework that reframes adaptation as latent-space control. STS exploits the observation that textual class prototypes exhibit strong low-rank structure and that their semantic variability can be captured through spectral decomposition. By performing singular value decomposition of the initial text embeddings, STS identifies a compact, semantically meaningful subspace corresponding to principal axes of variation. At test time, a small set of per-sample coefficients is optimized to steer text prototypes within this subspace by minimizing prediction entropy across augmented views of the input image. Importantly, STS operates entirely in the latent embedding space, avoids backpropagation through frozen encoders, introduces only a minimal number of tunable parameters, and treats the underlying VLM encoders as fixed feature extractors. Experiments across multiple benchmarks with natural distribution shifts and fine-grained classification demonstrate consistent improvements in zero-shot generalization while significantly reducing computational and memory overhead compared to existing test-time prompt adaptation methods. Collectively, this dissertation advances a unified perspective on test-time adaptation under distribution shifts, showing how adaptation can be achieved through structured latent-space transformations and sensitivity-aware update mechanisms, without relying on labeled target data or direct access to source data during adaptation. By addressing both large-scale vision-language models and medical imaging systems, the dissertation provides practical adaptation strategies for deploying AI models in real-world environments where data distributions can change across domains, sites, and operating conditions.

Me gusta

Guardar

Ver artículo completo