What question did this study set out to answer?

This study investigates the relationship between word length and semantic ambiguity across multiple languages.

May 16, 2026

Short words are more likely to refer to multiple meanings across 192 language families

Key Points

This study investigates the relationship between word length and semantic ambiguity across multiple languages.
Analyzed 633,308 wordforms across 1,952 languages from 192 families
Measured ambiguity through colexification, synset counts in Wordnet, and BERT-derived predictions
Controlled for word frequency to assess correlation robustness.
Demonstrated a significant negative correlation between word length and semantic ambiguity in all measures
Word length brevity correlated with ambiguity, independent of frequency
Findings support Zipf's laws of abbreviation and meaning.

Abstract

Abstract This study tested, for the first time on a diverse cross-linguistic dataset, the existence of a negative correlation between a word’s length and its semantic ambiguity. This correlation can be predicted on various grounds: as a way to make communication more efficient, as a mechanical consequence of the greater availability of short wordforms, or as a corollary of Zipf’s laws of abbreviation and meaning, which state that frequent words are shorter and have more meanings. We tested the correlation between word length and number of meanings in a broad range of languages, our biggest analysis studying 633,308 wordforms in 1,952 languages representing 192 families. We operationalize word ambiguity in three ways: as whether or not a word colexifies several meanings (using Lexibank data), as the number of synsets in Wordnet, and using BERT-derived predictions of the degree of ambiguity of a word. For all three measures of ambiguity, we evidence a robust correlation between a word’s brevity and ambiguity. The correlation remains substantial when controlling for word frequency, suggesting that it is more than a by-product of the fact that frequent words tend to be both shorter and more ambiguous.

Bookmark

Short words are more likely to refer to multiple meanings across 192 language families

Key Points

Abstract

Cite This Study