Multimodal fusion provides a comprehensive way to understand the world by integrating data from different sources. However, some studies believe that due to the optimization imbalance, partial modalities cannot be fully learned during multimodal learning. They attempt to achieve the balance between different modalities by controlling their learning process but ignore the function of the learning objective as an essential factor. The uniform objective for all modalities leads to the network being unable to sufficiently exploit discriminative information from different modalities. Therefore, we propose a new multimodal learning method, namely, modality-mix learning (MM learning), aiming to promote the sufficient learning of each modality via the designed multilabel objective. MM learning generates modality-mixed samples by combining modalities of different samples with varied labels, transforming the single label of a sample into a probability vector representing multilabel information. These modality-mixed samples are then fed into the network, which is trained to recognize the varying proportions of multilabel information. In addition, we introduce a bilevel learning scheme, where the network is first trained using standard learning to capture general features and select samples with strong prediction, followed by MM learning on these selected samples to further optimize the subexploration modality. MM learning forces different objective information to be learned from different modalities, avoiding the insufficient learning of modalities caused by the uniform learning objective. The experimental results show that the method can significantly boost different fusion strategies and methods in diversified multimodal datasets and improve the robustness of multimodal networks as well.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nannan Lu
Zhen Tan
Zhiyuan Han
IEEE Transactions on Neural Networks and Learning Systems
China University of Mining and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Lu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d34dd49c07852e0af9760e — DOI: https://doi.org/10.1109/tnnls.2026.3677806