Purpose This study evaluates the capabilities and limitations of large language models (LLMs) in classifying Chinese ethic minority books under the scheme of Chinese Library Classification. Design/methodology/approach A test collection of Chinese ethnic minority bibliographic records was constructed, and prompt engineering was used to compare the classification performance of DeepSeek-v3 and ChatGPT-4o under two input scenarios: “title + abstract” and “title only.” By designing evaluation metrics that include accuracy, granularity and error-type analysis, this study systematically evaluates the performance differences between the models, diagnoses the causes of errors and proposes improvement strategies. Findings Experimental results show both models performed well in the broad category classification of Chinese ethnic minority books, with DeepSeek-v3 exceeding 80% accuracy. Incorporating abstracts further improved accuracy and prompted longer, more detailed classification codes. However, accuracy declined for both as classification codes grew more specific. DeepSeek-v3 significantly outperformed ChatGPT-4o, achieving an overall accuracy of 40.78% and 33.50% with and without abstracts, respectively, while ChatGPT-4o remained below 6%. On the basis of classification error analysis, this study proposes improvements in classification system design, model capability enhancement and human–artificial intelligence (AI) collaboration to guide practical improvements in organizing ethnic minority resources. Originality/value Combining librarianship, ethnography and artificial intelligence, this study is the first to compare the classification ability of different large-language models for Chinese ethnic minority books. It reveals cultural limitations in knowledge organization systems, identifies the “capability threshold” of LLMs in cultural context processing and establishes an empirical basis for developing culturally-aware AI governance frameworks.
Jia et al. (Wed,) studied this question.