In real-world conditions, data typically contain multiple modalities and may have non-exclusive labels. A key stage of multimodal learning is the process of multimodal fusion, as it enables the integration of features from different sources into a unified vector space. This allows the classifier to utilize the constructed integrated vector to produce the final prediction. At the same time, traditional multimodal fusion methods rarely take into account cross-modal interactions, which play an essential role in uncovering dependencies between modalities and in constructing a shared space of their integrated representation. In this paper, we propose a conceptual framework for multimodal fusion with the use of multi-task learning. It is aimed at modeling a joint integrated representation space for all cross-modal interactions and adaptively tuning the loss functions of individual tasks in order to achieve optimal performance. The developed model employs a novel hierarchical multimodal fusion network that captures cross-modal interactions across all modality combinations and dynamically allocates weight coefficients for each pair depending on the specific data sample. In addition, a new multi-task learning approach is introduced to address multi-label classification challenges by automatically adjusting the training process both at the task level and at the sample level. Experimental results demonstrate that the proposed conceptual framework outperforms baseline models as well as several state-of-the-art methods. Furthermore, the flexibility and modularity of the proposed components of multimodal fusion and dynamic multi-task learning are showcased, making them applicable to various types of neural network architectures.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dmytro Merkotan
Oleksandr Trotsko
Communication informatization and cybersecurity systems and technologies
Building similarity graph...
Analyzing shared references across papers
Loading...
Merkotan et al. (Wed,) studied this question.
www.synapsesocial.com/papers/694025912d562116f28fea33 — DOI: https://doi.org/10.58254/viti.8.2025.11.133
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: