What question did this study set out to answer?

The aim is to assess the capabilities of large language models on historical Korean Literary Sinitic texts, focusing on their performance in low-resource languages.

January 18, 2026Open Access

KLSBench: Evaluating LLM Capabilities on Korean Literary Sinitic Texts in Historical Context

Key Points

The aim is to assess the capabilities of large language models on historical Korean Literary Sinitic texts, focusing on their performance in low-resource languages.
Introduction of KLSBench as a benchmark for evaluation.
Inclusion of 7871 instances from Joseon dynasty civil service exams and parallel corpora.
Assessment across five task categories: classification, retrieval, punctuation restoration, natural language inference, and translation.
KLSBench effectively distinguishes between lexical recall and deeper linguistic comprehension.
Provides evaluation baselines for large language models on low-resource historical languages.
Offers frameworks for deploying LLM-based tools in digital humanities contexts.

Abstract

Large language models (LLMs) show limited capability in processing low-resource historical languages due to insufficient training data and domain-specific linguistic structures. Korean Literary Sinitic (KLS), the principal written medium of the Joseon dynasty, remains particularly under-resourced despite its lexical overlap with modern Korean and shared script with classical Chinese. To enable systematic evaluation in this domain, we introduce KLSBench, a comprehensive benchmark for assessing LLM performance on KLS. KLSBench contains 7871 instances sourced from Joseon dynasty civil service examination archives and parallel corpora of the Four Books, and encompasses five task categories: classification, retrieval, punctuation restoration, natural language inference, and translation. Our evaluation suggests KLSBench could work as an effective diagnostic tool that distinguishes lexical recall from deeper linguistic comprehension in low-resource historical languages. Beyond establishing evaluation baselines, KLSBench provides practical frameworks for deploying LLM-based tools in digital humanities contexts, including automated annotation systems and intelligent search interfaces for classical text repositories.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Han et al. (Fri,) studied this question.

synapsesocial.com/papers/696c789ceb60fb80d1396d1f https://doi.org/https://doi.org/10.3390/app16020953

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper