What question did this study set out to answer?

The aim is to address semantic disconnection and visual distortion in environmental art design generation.

April 17, 2026Open Access

A consistent simulation model for environmental art design generation driven by multimodal transformer

Key Points

The aim is to address semantic disconnection and visual distortion in environmental art design generation.
Developed a consistent generation and simulation model using a multimodal transformer.
Integrated multiple sources of information: text, sketches, and scene images.
Utilized the MIT ade20k public dataset for model training and evaluation.
Achieved a visual fidelity area under the curve of 0.92, an 8.2% increase.
Improved user preference with a 15.7% increase in normalized discounted cumulative gain @10.
All key indicators showed statistically significant results (p < 0.01).

Abstract

Aiming at semantic disconnection and visual distortion between generated results and actual scenes in environmental art design, this paper proposes a consistent generation and simulation model based on multimodal transformer.Traditional methods have limitations in coordinating complex elements and ensuring spatial logic, hindering design implementation.By integrating multi-source information including text, sketches, and scene images, an end-to-end generation-simulation framework achieves consistent mapping from concept to high-fidelity visual output.Using the public dataset MIT ade20k, results show the model achieves significant improvements in visual fidelity (area under the curve 0.92, an increase of 8.2%) and user preference (normalised discounted cumulative gain @10 an increase of 15.7%), with all key indicators being statistically significant (p < 0.01).This confirms the model's effectiveness in enhancing automation and usability of environmental art design.

Bookmark

View Full Paper

Cite This Study

Li Ren (Thu,) studied this question.

synapsesocial.com/papers/69e1cf1b5cdc762e9d857fed https://doi.org/https://doi.org/10.1504/ijict.2026.152918

Bookmark

View Full Paper