Abstract This article presents a Multimodal Retrieval-Augmented Generation (RAG) system for the digital preservation of traditional knowledge (TK) from the Jino ethnic group in China, a small indigenous community whose knowledge is primarily transmitted through oral narratives, ritual practices, and place-based ecological experience. The system integrates text, audio, and image data, using the m3e-base model for embedding generation and Facebook AI Similarity Search for semantic search. A Flask backend supports cross-modal queries, with OpenAI models handling keyword extraction and generation. Deployed via Docker and Cloudflare Tunnel, the system is embedded in a WordPress interface for public access. Beyond implementation, the study addresses challenges in data collection, intellectual property, and cultural authenticity. Results indicate that AI-driven multimodal retrieval can support sustainable TK transmission and inform future digital heritage infrastructures.
Zhou et al. (Fri,) studied this question.