March 3, 2026Open Access

Modeling cross-modal interactions via a nonlinear information-density-aware network for MAFLD risk assessment

Key Points

NID-Net surpasses existing multimodal approaches, demonstrating a significant enhancement in predictive performance on diverse medical datasets.
The model employs a unique design that processes structured indicators using an XGBoost module, optimizing the handling of data sparsity and imbalance.
Utilizing tongue images, the region-enhanced Swin Transformer encodes features while focusing on informative local representations—crucial for effective understanding and prediction.
This work emphasizes the necessity of nonlinear design in creating balanced, efficient, and interpretable multimodal prediction systems for complex conditions.

Abstract

Modern multimodal learning often requires handling heterogeneous data types whose structures and information densities differ substantially. To address this challenge in the context of metabolic dysfunction-associated fatty liver disease (MAFLD) prediction, we propose an information-density-aware multimodal framework (NID-Net). Instead of relying on simple concatenation or shallow fusion, the model processes each modality using methods that align with its structural characteristics. Structured indicators with high information density are first processed by an XGBoost module optimized via Lagrange remainder correction, which enhances the nonlinearity of the loss landscape and improves robustness to data sparsity and imbalance. Meanwhile, tongue images with relatively low information density are encoded using a region-enhanced Swin Transformer, where adaptive regional biases guide the model toward informative local representations. The resulting modality-specific embeddings are fused within a Mixture-of-Experts (MoE) architecture, enabling selective specialization and nonlinear decision boundaries across modalities. Extensive experiments on real-world medical datasets demonstrate that NID-Net not only surpasses existing multimodal fusion approaches in predictive performance but also provides interpretable insights into cross-modal feature interactions. This work highlights the fundamental role of nonlinear design in achieving efficient, balanced, and explainable multimodal prediction systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaohua Hu

Xiang Zhu

Jia Shi

Journals

Chaos Solitons & Fractals

Actions

Institutions

Second Military Medical University

Nanjing Forestry University

Changhai Hospital

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modeling cross-modal interactions via a nonlinear information-density-aware network for MAFLD risk assessment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider