Abstract Chinese local product records in traditional gazetteers constitute a vital component of cultural heritage. Recent developments in artificial intelligence (AI) have opened up new avenues for the preservation and utilization of such heritage. This article introduces the large language models (LLMs) tailored to Chinese local products, comprising a base model, a chat model, and a reasoning model collectively referred to as the Chinese local product LLM series. These models validate the feasibility of applying LLMs to the intelligent processing of Chinese local product knowledge and provide both a technical paradigm and toolset for cultural heritage research. This article first constructs a hybrid pre-training dataset containing one billion characters, drawn from historical texts on Chinese local products. Leveraging this dataset, the Qwen open-source base models underwent further pre-training for domain adaptation, leading to the development of specialized models for Chinese local product knowledge, including a base model, a chat model, and a reasoning model. These models are subsequently deployed as API endpoints and web applications to support automated textual processing and knowledge-based question answering services related to Chinese local products. This article marks the first attempt to develop LLMs specifically for the domain of cultural heritage with a focus on local products. It further demonstrates the effectiveness of continued pre-training, instruction tuning, data distillation, long-chain-of-thought fine-tuning, and retrieval-augmented generation in improving the quality of local product information services. The findings underscore the potential of LLMs to advance both the preservation and protection of cultural heritage in the AI era.
Wang et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: