What type of study is this?

September 10, 2025

Intelligent digital art design fusion platform based on multimodal transformation

Key Points

Multimodal fusion improves creative efficiency by 33.3%, enhancing both artwork diversity and expressiveness.
The platform incorporates a multi-modal Transformer architecture to effectively fuse image, text, and audio data.
Analysis shows that three-modal fusion creation takes an average of 80 minutes compared to 120 minutes for single-modal design.
The platform's limitations on mixed resolutions and adaptive techniques will be explored in future work, aiming to improve output integrity.

Abstract

With the continuous advancement of artificial intelligence technology, intelligent digital art design gradually integrates multi-modal data, such as images, text, and audio, in the creative process, improving the creativity and efficiency of design. Traditional art design platforms have problems such as insufficient information fusion and low creative efficiency when dealing with multi-modal data. In order to solve these challenges, this paper proposes an intelligent digital art design fusion platform based on multi-modal Transformer, which effectively fuses data of different modalities through the multi-modal Transformer architecture to improve creative efficiency and work quality. The proposed multi-modal Transformer framework is a novel approach to digital art creation, overcoming traditional limitations of single-modal platforms by integrating image, text, and audio. This multi-modal fusion significantly enhances creative efficiency by 33.3% and creativity by improving both the diversity and expressiveness of the generated artworks, primarily within a unified image resolution framework. However, the current platform is optimized for fixed resolution image generation. Handling mixed resolutions or modified images (in pixel or grid formats) presents challenges, particularly in maintaining output integrity. This limitation is recognized and will be addressed in future work through adaptive resolution techniques. The innovation lies in effectively leveraging the self-attention mechanism to balance the computational load while enriching creative outputs, addressing both artistic and technological challenges. Specific data analysis shows that the average time of three-modal fusion creation design is 80 min, while that of single-modal creation is 120 min, which proves the significant advantages of multi-modal fusion in accelerating design creation. In addition, the platform has also achieved good results in the quality of creation, and the creative score has increased by about 25% compared with the traditional platform.

Bookmark

Cite This Study

孝司猪里 (Wed,) studied this question.

synapsesocial.com/papers/68c1872d9b7b07f3a06115ef https://doi.org/https://doi.org/10.1177/14727978251374335

Bookmark