The digital preservation of intangible cultural heritage requires computational models that can jointly represent visual, textual, and performative knowledge while remaining interpretable in cross-cultural settings. Using Chinese shadow puppetry as the application domain, this study develops a multimodal knowledge-graph framework that combines Faster R-CNN, BiLSTM-CRF, OpenPose-GCN, SimplE embedding, a VGG16-BERT dual encoder, and Sinkhorn-based temporal alignment to organize image, text, and action information in a unified structure. The empirically validated contribution of the framework lies in multimodal knowledge-graph construction, cross-modal alignment, and adaptive narrative selection; generative rendering and blockchain traceability are retained as extensible system modules rather than the sole basis of the quantitative claims. In the reported experiments, the framework achieved 91.8% semantic alignment and 87.6% synchronization accuracy across repeated trials, while the dynamic narrative case reached an 81.5% user retention rate in the tested sample. These findings suggest that multimodal knowledge graphs can support structured cross-cultural narrative generation for shadow puppetry, while the current evidence should still be interpreted within the limits of the available corpus, user sample size, and partial reproducibility.
Cheng et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: