Sentence length reflects cognitive constraints and stylistic decisions about speech and text segmentation for effective communication, but whether sentence length distributions follow universal patterns across languages and genres remains unclear. This study investigates whether sentence lengths and sub-sentence lengths-defined as the number of words between sentence-ending punctuation marks and between adjacent punctuation marks-follow a unified probabilistic distribution across languages, whether this reflects linguistic genealogy, and whether the distribution is affected by genre. Given the links between sentence length, cognitive constraints, and stylistic decisions, we predicted that sentence and sub-sentence lengths would follow a unified probabilistic distribution across languages, modulated by linguistic genealogy and genre. Analyzing news texts in 10 languages, we found that sentence and sub-sentence length distributions both conform to a probabilistic model, the Extended Positive Negative Binomial distribution, which was previously shown to capture sentence length distributions in certain languages. To assess whether these differences align with linguistic typology, we performed cluster analysis based on mean length and distribution parameters, with results mirroring known linguistic genealogical relationships. To examine the genre effects, we analyzed sentence and sub-sentence length distributions across three written genres in English and Chinese. Generalized linear models revealed systematic influences of both genre and language, but with varying results on different linguistic levels: genre accounted for more variance in sentence-level metrics, whereas language exerted stronger effects at the sub-sentence level. Sentence and sub-sentence length distributions reflect a universal probabilistic pattern in punctuation-based sentence segmentation, influenced by cognitive constraints and genre-driven adaptability across languages.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yikai Zhou
Jingyang Jiang
Haitao Liu
Zhejiang University
Fudan University
Zhejiang International Studies University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhou et al. (Mon,) studied this question.
synapsesocial.com/papers/68d6e14f8b2b6861e4c3fc67 — DOI: https://doi.org/10.1111/cogs.70115