Abstract This study tested, for the first time on a diverse cross-linguistic dataset, the existence of a negative correlation between a word’s length and its semantic ambiguity. This correlation can be predicted on various grounds: as a way to make communication more efficient, as a mechanical consequence of the greater availability of short wordforms, or as a corollary of Zipf’s laws of abbreviation and meaning, which state that frequent words are shorter and have more meanings. We tested the correlation between word length and number of meanings in a broad range of languages, our biggest analysis studying 633,308 wordforms in 1,952 languages representing 192 families. We operationalize word ambiguity in three ways: as whether or not a word colexifies several meanings (using Lexibank data), as the number of synsets in Wordnet, and using BERT-derived predictions of the degree of ambiguity of a word. For all three measures of ambiguity, we evidence a robust correlation between a word’s brevity and ambiguity. The correlation remains substantial when controlling for word frequency, suggesting that it is more than a by-product of the fact that frequent words tend to be both shorter and more ambiguous.
Koshevoy et al. (Thu,) studied this question.