Lemmas, flemmas, and level-6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation’s BNC-COCA word lists that can be used to quantify several features of lexical units (https: //osf. io/4mz6y/? viewₒnly=a295f0089f8745c29667ef21479578c6). The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that the rapid corpus coverage of WF6 is due to its much greater inclusivity than flemmas and lemmas in the 1-2k bands and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter’s overestimation of coverage and blocking of homographic concepts. An examination of semantic extension reveals that polysemy is almost as frequent in lemmas as in flemmas and that figurative extension is an important aspect of lexical knowledge. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency basewords and tens of derivational affixes in most genres.
Building similarity graph...
Analyzing shared references across papers
Loading...
Phil Bennett
SHILAP Revista de lepidopterología
Building similarity graph...
Analyzing shared references across papers
Loading...
Phil Bennett (Tue,) studied this question.
www.synapsesocial.com/papers/698584b78f7c464f230081f5 — DOI: https://doi.org/10.29140/vli.2026.103408