What question did this study set out to answer?

The aim is to construct large language models tailored for Chinese local product heritage to enhance preservation methods.

February 25, 2026

Artificial intelligence in cultural heritage: Constructing a domain-specific LLM series for Chinese local product heritage

Key Points

The aim is to construct large language models tailored for Chinese local product heritage to enhance preservation methods.
Constructed a hybrid pre-training dataset containing one billion characters from historical texts.
Developed a series of domain-specific LLMs, including a base model, chat model, and reasoning model.
Implemented models as API endpoints and web applications for automated knowledge processing.
Validated the feasibility of using LLMs for intelligent processing of local product knowledge.
Demonstrated improvements in information service quality through methods like instruction tuning and fine-tuning.
Established a technical toolset for cultural heritage research related to local products.

Abstract

Abstract Chinese local product records in traditional gazetteers constitute a vital component of cultural heritage. Recent developments in artificial intelligence (AI) have opened up new avenues for the preservation and utilization of such heritage. This article introduces the large language models (LLMs) tailored to Chinese local products, comprising a base model, a chat model, and a reasoning model collectively referred to as the Chinese local product LLM series. These models validate the feasibility of applying LLMs to the intelligent processing of Chinese local product knowledge and provide both a technical paradigm and toolset for cultural heritage research. This article first constructs a hybrid pre-training dataset containing one billion characters, drawn from historical texts on Chinese local products. Leveraging this dataset, the Qwen open-source base models underwent further pre-training for domain adaptation, leading to the development of specialized models for Chinese local product knowledge, including a base model, a chat model, and a reasoning model. These models are subsequently deployed as API endpoints and web applications to support automated textual processing and knowledge-based question answering services related to Chinese local products. This article marks the first attempt to develop LLMs specifically for the domain of cultural heritage with a focus on local products. It further demonstrates the effectiveness of continued pre-training, instruction tuning, data distillation, long-chain-of-thought fine-tuning, and retrieval-augmented generation in improving the quality of local product information services. The findings underscore the potential of LLMs to advance both the preservation and protection of cultural heritage in the AI era.

Demander à l'IA

Bookmark

Cite This Study

Wang et al. (Wed,) studied this question.

synapsesocial.com/papers/699e91eaf5123be5ed04fd07 https://doi.org/https://doi.org/10.1093/llc/fqaf125

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Demander à l'IA

Bookmark