The digital archives of South Korea's intangible cultural heritage contain multimodal resources-images, texts, and audio-posing challenges in cross-modal retrieval due to semantic complexity and the trade-off between accuracy and efficiency.This paper proposes a novel deep hashing network to address these issues.The model employs modality-specific encoders to extract features and a unified hashing layer to generate compact binary codes.A joint loss function is introduced to preserve cross-modal similarity while enabling effective quantisation, enhancing both discrimination and retrieval performance.Comprehensive evaluations on public datasets show that our method achieves a mean average precision of 92.7%, outperforming state-of-the-art approaches by 5.2%, while maintaining real-time retrieval speed.The framework offers a scalable solution that significantly improves accessibility and management for digital cultural heritage platforms.
Ying Zhang (Thu,) studied this question.