Abstract Chinese online literature has become a major contemporary literary phenomenon, yet its linguistic characteristics remain underexplored from a large-scale empirical perspective. To address this gap, this study adopts a digital humanities framework and conducts a corpus-based, multi-level quantitative analysis using a self-constructed large-scale corpus of Chinese online literature. The Qidian Online Literature Corpus (QOLC) was built through automated web scraping and stratified sampling, covering male-targeted, female-targeted, and light novels. Using computational linguistic methods, the study examines textual form, lexical usage, structural complexity, semantic organization, stylistic variation, and emotional dynamics. The results reveal that online literature exhibits distinctive linguistic patterns that differentiate it from traditional written and spoken forms, which this study characterizes as an Oralized Written Register. Its linguistic features include moderate lexical richness, low lexical density, Zipf’s law of lexical distributions, dialogue-driven narration with compact syntax, gender-invariant nMDD despite sentence-length variation, and a PCA-identified stylistic dichotomy. In addition, systematic stylistic differences are observed between male- and female-oriented novels. Overall, this research offers a data-driven taxonomy of the stylistic features of Chinese online literature, underscoring the effectiveness of computational methods in characterizing emerging literary forms and the value of quantitative analysis in digital humanities research.
Wang et al. (Mon,) studied this question.