Cultural symbols embedded in visual media serve as powerful conduits for emotional engagement, yet their affective impact remains largely unquantified in heritage tourism research. This paper proposes a multimodal learning framework that integrates image, text, and video modalities to systematically measure the emotional resonance of cultural symbols across diverse visual media platforms. In this work, we define cultural symbols computationally as visually identifiable elements within heritage imagery including architectural motifs, ritual objects, decorative patterns, sacred landscape features, and ceremonial artifacts that carry culturally specific semantic and affective associations beyond their physical appearance. Drawing on a dataset of heritage tourism content collected from social media and official promotional channels, we design a cross modal attention model capable of identifying emotionally salient symbolic features and mapping them to quantifiable affective dimensions. Our results demonstrate that multimodal fusion significantly outperforms unimodal baselines in predicting emotional engagement, and that specific symbolic categories including architectural motifs, ritual objects, and landscape iconography exhibit distinct emotional signatures across cultural contexts. These findings offer actionable insights for heritage tourism practitioners seeking to optimise visual communication strategies and enhance visitor emotional connection with cultural sites.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mian Wu
Chunhui Zhang
Lyu Li
Scientific Reports
East China Jiaotong University
Communication University of China
Building similarity graph...
Analyzing shared references across papers
Loading...
Wu et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69fececcb9154b0b82876113 — DOI: https://doi.org/10.1038/s41598-026-51963-4