What question did this study set out to answer?

The objective is to improve representation learning in medical images using a new contrastive learning approach.

April 18, 2026Open Access

Constrained Multiview Contrastive Learning for Jointly Supervised Representation Learning

Key Points

The objective is to improve representation learning in medical images using a new contrastive learning approach.
Proposed a mutual information-based mechanism for quantifying representation distance.
Introduced a constrained multiview learning paradigm to dynamically select sample pairs.
Evaluated using publicly available CT lung lesion segmentation datasets connected to CNN and transformer models.
Demonstrated superior performance over CNN and transformer baselines on lung lesion datasets.
Achieved enhanced multi-view contrastive learning and maximization of mutual information for representation selection.

Abstract

Learning meaningful representation constitutes a pivotal problem in constructing foundation models. Nevertheless, the complex anatomical patterns and the random distribution of lesions in medical images pose significant challenges to understanding and disentangling useful representations. Contrastive learning has demonstrated remarkable success in decoupling representations, but measuring the distance in a high-dimensional feature space is still hard. In this paper, we propose a mutual information-based mechanism for quantifying the representation distance. However, collecting millions of samples and constructing a huge positive-negative sample bank for conducting effective contrastive learning is impractical in the medical domain. To address such an issue, we introduce a constrained multiview learning paradigm. Specifically, we conduct a dynamic representation reranking and selection process to enhance the quality of the positive and negative sample pairs. Our method benefits both the continuous MI estimating and the representation significance measuring, enhancing the contrastive learning process and semantic comprehension. Our proposed framework was rigorously evaluated using publicly accessible CT-captured lung lesion segmentation datasets and compared against influential baseline models with either pure CNN modules or transformer modules. The statistical results under the four metrics demonstrate that our proposed framework proficiently optimizes the multi-view contrastive learning process and improves MI maximization-driven representation learning. • We introduce a new frequency domain-based multi-view generation strategy for self-supervised contrastive learning, which is also easy to expand as semi-supervised learning when getting the mask involved. • We propose a novel continuous mutual information maximization and score-ranking method for feature selection, solving the problem of preventing those less useful views from being used in contrastive learning. • Our statistical and visualization results demonstrate superior performance through extensive experiments on three public lung lesion datasets, surpassing established CNN and transformer baselines under multiple evaluation metrics. • The proposed MIMIC framework is model-agnostic and can be integrated into existing segmentation pipelines.

Constrained Multiview Contrastive Learning for Jointly Supervised Representation Learning

Key Points

Abstract

Cite This Study