This study examines how tourist perceptions in theme park settings are shaped by three modalities of user-generated content: text, image-text, and video. Drawing on Media Richness Theory, we analyze posts from Weibo and RedNote related to Universal Beijing Resort and Shanghai Disneyland using topic modeling and deep learning sentiment classification. Seven perceptual themes emerge, with clear cross-modal differences: text emphasizes service evaluation, image-text highlights aesthetics and symbolic elements, and video captures immersive, process-based narratives; sentiment distributions also differ by modality. Theoretically, the findings refine media-task fit by showing stable correspondences between modality cues and the focus of perceptual expression. Methodologically, the study demonstrates a scalable pipeline for mining multimodal perceptions from large, real-world corpora. Practically, the results translate into concrete measures: deploy text monitoring to surface operational issues and reply with concise guidance; standardize visual presets and required hashtags to strengthen brand visibility; and curate first-person video with authentic on-site sound to amplify atmosphere around characters and rides.
Liu et al. (Thu,) studied this question.