Large language models (LLMs) show limited capability in processing low-resource historical languages due to insufficient training data and domain-specific linguistic structures. Korean Literary Sinitic (KLS), the principal written medium of the Joseon dynasty, remains particularly under-resourced despite its lexical overlap with modern Korean and shared script with classical Chinese. To enable systematic evaluation in this domain, we introduce KLSBench, a comprehensive benchmark for assessing LLM performance on KLS. KLSBench contains 7871 instances sourced from Joseon dynasty civil service examination archives and parallel corpora of the Four Books, and encompasses five task categories: classification, retrieval, punctuation restoration, natural language inference, and translation. Our evaluation suggests KLSBench could work as an effective diagnostic tool that distinguishes lexical recall from deeper linguistic comprehension in low-resource historical languages. Beyond establishing evaluation baselines, KLSBench provides practical frameworks for deploying LLM-based tools in digital humanities contexts, including automated annotation systems and intelligent search interfaces for classical text repositories.
Han et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: