What question did this study set out to answer?

This research aims to improve interior design style recognition by leveraging a new data-model framework.

May 6, 2026Open Access

Attention-Guided Transformer Architecture for Fine-Grained Style Representation in Design Systems

Key Points

This research aims to improve interior design style recognition by leveraging a new data-model framework.
Introduced InDes, a dataset of 3,826 architecturally labeled images
Employed semi-supervised learning to address class imbalance
Developed AT3, an attention-guided transformer architecture for style extraction with hierarchical attention.
Achieved 87.8% validation accuracy and F1-score of 0.86 with AT3
Outperformed existing convolutional and transformer-based methods by margins of 5.2% and 6.7%.
Attention mechanisms preserved fine-grained details while improving long-distance feature modeling.

Abstract

Abstract Interior design style recognition presents a challenging computational problem due to fine-grained inter-style similarities, complex spatial arrangements, and strong variability in materials, lighting, and layout. From a computational design and engineering perspective, accurately modeling such stylistic variations is essential not only for classification, but also for enabling semantic representation, retrieval, and analysis of design cases within data-driven design systems. However, existing approaches often struggle to balance the extraction of fine-grained local cues with the modeling of global contextual relationships in complex interior scenes. In this study, we propose a unified data-model framework for interior design style recognition. First, we introduce InDes, a curated interior design image dataset comprising 3,826 images across five architectural styles. The dataset integrates expert-labeled real-world samples from Vietnamese studios, with additional pseudo-labeled data generated through a semi-supervised learning strategy, addressing class imbalance and supporting robust model training under limited annotation conditions. Second, we propose AT3, an attention-guided transformer architecture that combines convolutional feature extraction with hierarchical attention mechanisms and transformer-based global context modeling. By applying attention-based feature refinement prior to transformer tokenization, the proposed architecture preserves fine-grained stylistic cues while enabling effective long-range dependency modeling across interior scenes. This design improves the quality of learned representations for complex design environments characterized by subtle visual differences. Extensive experiments on the InDes dataset demonstrate that AT3, when instantiated with an Xception backbone, achieves a validation accuracy of 87.8% and an F1-score of 0.86, outperforming strong convolutional and transformer-based baselines by margins of 5.2% and 6.7%, respectively. Ablation studies further validate the effectiveness of the proposed attention-guided architecture. Beyond classification performance, the learned representations provide a computational foundation for downstream design-oriented tasks such as style-aware retrieval, clustering, and data-driven analysis in computational design and engineering systems.

Attention-Guided Transformer Architecture for Fine-Grained Style Representation in Design Systems

Key Points

Abstract

Cite This Study