With the continuous advancement of artificial intelligence technology, intelligent digital art design gradually integrates multi-modal data, such as images, text, and audio, in the creative process, improving the creativity and efficiency of design. Traditional art design platforms have problems such as insufficient information fusion and low creative efficiency when dealing with multi-modal data. In order to solve these challenges, this paper proposes an intelligent digital art design fusion platform based on multi-modal Transformer, which effectively fuses data of different modalities through the multi-modal Transformer architecture to improve creative efficiency and work quality. The proposed multi-modal Transformer framework is a novel approach to digital art creation, overcoming traditional limitations of single-modal platforms by integrating image, text, and audio. This multi-modal fusion significantly enhances creative efficiency by 33.3% and creativity by improving both the diversity and expressiveness of the generated artworks, primarily within a unified image resolution framework. However, the current platform is optimized for fixed resolution image generation. Handling mixed resolutions or modified images (in pixel or grid formats) presents challenges, particularly in maintaining output integrity. This limitation is recognized and will be addressed in future work through adaptive resolution techniques. The innovation lies in effectively leveraging the self-attention mechanism to balance the computational load while enriching creative outputs, addressing both artistic and technological challenges. Specific data analysis shows that the average time of three-modal fusion creation design is 80 min, while that of single-modal creation is 120 min, which proves the significant advantages of multi-modal fusion in accelerating design creation. In addition, the platform has also achieved good results in the quality of creation, and the creative score has increased by about 25% compared with the traditional platform.
孝司 猪里 (Wed,) studied this question.