What question did this study set out to answer?

The research aims to analyze the linguistic characteristics of Chinese online literature using a digital humanities framework.

February 20, 2026

Exploring textual features of Chinese online literature: a multi-level digital humanities framework for literary texts

Key Points

The research aims to analyze the linguistic characteristics of Chinese online literature using a digital humanities framework.
Conducted a corpus-based quantitative analysis using the Qidian Online Literature Corpus (QOLC).
Utilized automated web scraping and stratified sampling to build a large-scale corpus.
Examined various aspects such as lexical usage, structural complexity, and emotional dynamics using computational methods.
Identified distinctive linguistic patterns in online literature compared to traditional forms.
Characterized the literature as an Oralized Written Register with moderate lexical richness and low density.
Found systematic stylistic differences between male-oriented and female-oriented novels.

Abstract

Abstract Chinese online literature has become a major contemporary literary phenomenon, yet its linguistic characteristics remain underexplored from a large-scale empirical perspective. To address this gap, this study adopts a digital humanities framework and conducts a corpus-based, multi-level quantitative analysis using a self-constructed large-scale corpus of Chinese online literature. The Qidian Online Literature Corpus (QOLC) was built through automated web scraping and stratified sampling, covering male-targeted, female-targeted, and light novels. Using computational linguistic methods, the study examines textual form, lexical usage, structural complexity, semantic organization, stylistic variation, and emotional dynamics. The results reveal that online literature exhibits distinctive linguistic patterns that differentiate it from traditional written and spoken forms, which this study characterizes as an Oralized Written Register. Its linguistic features include moderate lexical richness, low lexical density, Zipf’s law of lexical distributions, dialogue-driven narration with compact syntax, gender-invariant nMDD despite sentence-length variation, and a PCA-identified stylistic dichotomy. In addition, systematic stylistic differences are observed between male- and female-oriented novels. Overall, this research offers a data-driven taxonomy of the stylistic features of Chinese online literature, underscoring the effectiveness of computational methods in characterizing emerging literary forms and the value of quantitative analysis in digital humanities research.

Mark Helpful

Bookmark

Relay