What type of study is this?

This is a Quantitative Study study.

September 17, 2025

Foundation Models for Multimodal MRI Synthesis with Language Guidance

Key Points

High-quality synthesis performance was achieved across various modalities and datasets, enhancing MRI outputs.
The model shows robust adaptability via language guidance, which improves image synthesis outcomes with limited data.
Using a latent diffusion model enables fast synthesis conditioned on textual descriptions, optimizing computational efficiency.
The approach addresses limitations of conventional models by integrating visual and textual inputs for enhanced synthesis.

Abstract

Motivation: We aim to introduce a foundation model based on visual and textual inputs to enable robust, unified image synthesis in multimodal MRI. Goal(s): Our goal is to demonstrate a versatile foundation model, with language guidance for accurate target descriptions, that adapts easily to new modalities and datasets, using computationally efficient fine-tuning strategies with minimal additional data and training. Approach: Our approach conditions synthesis on source-modality images and target-modality text descriptions, via a text encoder to embed textual inputs, one-step latent diffusion model to perform fast synthesis, and low-rank adaptation for efficient fine-tuning. Results: We demonstrated high-quality synthesis performance over various modalities and datasets. Impact: Conventional synthesis models rely on image-to-image translation with just visual inputs and often show limited generalizability. We demonstrate a foundation model with language guidance that leverages textual inputs for improved adaptability to new modalities.

Mark Helpful

Bookmark

Relay