The growing provision of heterogeneous data, such as text, images, and audio, has offered huge possibilities to come up with intelligent systems that can understand the context comprehensively. Nevertheless, artificial intelligence methods that have existedhave been mostly unimodal or have used loosely coupled multimodal methods, which restrict their practical application in real-world, intricate decision-making settings. This research paper employed a design science research approach to create and test a cohesive multimodal deep learning framework of cross-modal understanding and decision support. The suggested system incorporated modality-specific encoders, such as textual data models based on transformers, visual input models based on convolutional and vision transformer networks, and audio processing models based on spectrograms. These representations were integrated by a common latent space via cross-modal attention to allow efficient feature alignment and cross-modal learning. Deep learning frameworks were used to create a scalable prototype and trained and tested on benchmark multimodal datasets, such as MSCOCO (image-text) and AudioSet (audio-visual). The system was evaluated on various tasks, including multimodal classification, context-based inference, and prediction, and using such performance metrics as accuracy, F1-score, and inference latency. Performance improvements were validated by comparison with unimodal baselines and multimodal models in existence. The findings showed that there was increased accuracy, better contextual comprehension and strong real time inference. This paper added a versatile and scalable multimodal data fusion architecture, and it can find use in multiple areas: intelligent surveillance,medical analytics, and smart buildings. This paper contributed to the creation of scalable and context-sensitive AI-based decision support systems by tackling the problem of cross-modal integration.
Building similarity graph...
Analyzing shared references across papers
Loading...
Faith Sodipe
University of Salford
Building similarity graph...
Analyzing shared references across papers
Loading...
Faith Sodipe (Mon,) studied this question.
www.synapsesocial.com/papers/69e865d76e0dea528ddea51e — DOI: https://doi.org/10.5281/zenodo.19671889