February 11, 2025Open Access

Medical multimodal multitask foundation model for lung cancer screening

Key Points

Key points are not available for this paper at this time.

Abstract

Lung cancer screening (LCS) reduces mortality and involves vast multimodal data such as text, tables, and images. Fully mining such big data requires multitasking; otherwise, occult but important features may be overlooked, adversely affecting clinical management and healthcare quality. Here we propose a medical multimodal-multitask foundation model (M3FM) for three-dimensional low-dose computed tomography (CT) LCS. After curating a multimodal multitask dataset of 49 clinical data types, 163,725 chest CT series, and 17 tasks involved in LCS, we develop a scalable multimodal question-answering model architecture for synergistic multimodal multitasking. M3FM consistently outperforms the state-of-the-art models, improving lung cancer risk and cardiovascular disease mortality risk prediction by up to 20% and 10% respectively. M3FM processes multiscale high-dimensional images, handles various combinations of multimodal data, identifies informative data elements, and adapts to out-of-distribution tasks with minimal data. In this work, we show that M3FM advances various LCS tasks through large-scale multimodal and multitask learning. Lung cancer screening (LCS) requires effectively and efficiently mining big, multimodal datasets. Here, the authors develop a medical multimodal-multitask foundation model (M3FM) for LCS from 3D low-dose computed tomography and medical multimodal data, outperforming state-of-the-art methods and allowing the identification of informative data elements.

Bookmark

View Full Paper