Abstract Interior design style recognition presents a challenging computational problem due to fine-grained inter-style similarities, complex spatial arrangements, and strong variability in materials, lighting, and layout. From a computational design and engineering perspective, accurately modeling such stylistic variations is essential not only for classification, but also for enabling semantic representation, retrieval, and analysis of design cases within data-driven design systems. However, existing approaches often struggle to balance the extraction of fine-grained local cues with the modeling of global contextual relationships in complex interior scenes. In this study, we propose a unified data-model framework for interior design style recognition. First, we introduce InDes, a curated interior design image dataset comprising 3,826 images across five architectural styles. The dataset integrates expert-labeled real-world samples from Vietnamese studios, with additional pseudo-labeled data generated through a semi-supervised learning strategy, addressing class imbalance and supporting robust model training under limited annotation conditions. Second, we propose AT3, an attention-guided transformer architecture that combines convolutional feature extraction with hierarchical attention mechanisms and transformer-based global context modeling. By applying attention-based feature refinement prior to transformer tokenization, the proposed architecture preserves fine-grained stylistic cues while enabling effective long-range dependency modeling across interior scenes. This design improves the quality of learned representations for complex design environments characterized by subtle visual differences. Extensive experiments on the InDes dataset demonstrate that AT3, when instantiated with an Xception backbone, achieves a validation accuracy of 87.8% and an F1-score of 0.86, outperforming strong convolutional and transformer-based baselines by margins of 5.2% and 6.7%, respectively. Ablation studies further validate the effectiveness of the proposed attention-guided architecture. Beyond classification performance, the learned representations provide a computational foundation for downstream design-oriented tasks such as style-aware retrieval, clustering, and data-driven analysis in computational design and engineering systems.
Nguyen et al. (Fri,) studied this question.