Learning meaningful representation constitutes a pivotal problem in constructing foundation models. Nevertheless, the complex anatomical patterns and the random distribution of lesions in medical images pose significant challenges to understanding and disentangling useful representations. Contrastive learning has demonstrated remarkable success in decoupling representations, but measuring the distance in a high-dimensional feature space is still hard. In this paper, we propose a mutual information-based mechanism for quantifying the representation distance. However, collecting millions of samples and constructing a huge positive-negative sample bank for conducting effective contrastive learning is impractical in the medical domain. To address such an issue, we introduce a constrained multiview learning paradigm. Specifically, we conduct a dynamic representation reranking and selection process to enhance the quality of the positive and negative sample pairs. Our method benefits both the continuous MI estimating and the representation significance measuring, enhancing the contrastive learning process and semantic comprehension. Our proposed framework was rigorously evaluated using publicly accessible CT-captured lung lesion segmentation datasets and compared against influential baseline models with either pure CNN modules or transformer modules. The statistical results under the four metrics demonstrate that our proposed framework proficiently optimizes the multi-view contrastive learning process and improves MI maximization-driven representation learning. • We introduce a new frequency domain-based multi-view generation strategy for self-supervised contrastive learning, which is also easy to expand as semi-supervised learning when getting the mask involved. • We propose a novel continuous mutual information maximization and score-ranking method for feature selection, solving the problem of preventing those less useful views from being used in contrastive learning. • Our statistical and visualization results demonstrate superior performance through extensive experiments on three public lung lesion datasets, surpassing established CNN and transformer baselines under multiple evaluation metrics. • The proposed MIMIC framework is model-agnostic and can be integrated into existing segmentation pipelines.
Dai et al. (Wed,) studied this question.