March 3, 2026Open Access

Assessing EFL Learners’ Phraseological Competence through Association Measures: The Role of Register in Reference Corpus Selection

Key Points

Assessments of phraseological competence differ based on the reference corpus selected, emphasizing the importance of context.
Pointwise Mutual Information (PMI) serves as a reliable metric for evaluating collocational strength across registers.
Different reference corpora highlight distinct types of phraseological units, affecting the measurement of L2 proficiency.
Carefully selected reference corpora are essential for accurately assessing phraseological competence in EFL learners.

Abstract

Association measures, such as Pointwise Mutual Information (PMI), have been widely used in learner corpus research to explore learners' phraseological competence (Paquot Granger Paquot, 2019). PMI has been widely validated as a reliable indicator of L2 proficiency in both writing and speaking. Research has shown that lower-proficiency learners tend to rely on word-by-word composition, leading to weaker collocational strength, whereas more advanced learners produce stronger associations, reflecting an increasing level of phraseological competence (Bestgen, 2017; Bestgen Durrant Garner et al., 2019; 2020; Uchihara et al., 2020; Zhang et al., 2023). Despite its robustness, the effectiveness of PMI as a measure of collocational strength is contingent upon the reference corpora used (Gablasova et al., 2017; Paquot Bestgen Paquot et al., 2022), and they may also rely on genre-specific corpora that raise significant questions about comparability across different contexts (Durrant Paquot, 2019). This complexity emphasizes the critical need for carefully selected reference corpora that authentically represent the contexts of learner output. Thus, understanding how reference corpus selection affects PMI calculations is pivotal for accurately assessing phraseological competence. In light of these considerations, this study investigates how the characteristics of reference corpora affect the calculation of PMI scores for word combinations in EFL learners' oral production. The research is guided by two key questions: 1) How does the register of the reference corpus influence PMI-based measures of collocational strength in EFL learners’ oral production? 2) How do differences in PMI scores resulting from reference corpora representing different registers impact the assessment of phraseological competence in EFL learners?? Utilizing a corpus of oral performances from 90 test-takers (CEFR levels A2-C2) on a TEM 8 Oral Test commentary task, this study calculated PMI scores for word combinations in direct objects (dobj) grammatical relation. The reference corpora—COCA Academic, COCA Newspaper, and COCA Spoken—were chosen to align with different communicative purposes relevant to learners’ language use. Given that the dataset consists of university students’ oral commentaries on social issues, these corpora capture key register features: the spoken mode (COCA Spoken), argumentative and expository discourse in public communication (COCA Newspaper), and formal academic discussion (COCA Academic). This selection ensures a nuanced analysis of how phraseological competence varies across registers and communicative contexts. Preliminary findings reveal that assessments of collocation use vary depending on the reference corpus selected, revealing how corpus choice influences the evaluation of phraseological competence. More importantly, PMI scores calculated from the three different reference corpora reveal distinct aspects of phraseological competence in L2 oral production, particularly by emphasizing different types of phraseological units (e.g., general vs. academic collocations) and register-specific preferences. The findings underscore the critical necessity of meticulously selecting reference corpora that authentically represent the context of learners' output in order to accurately and effectively assess phraseological competence.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wang et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75fe6c6e9836116a2c391

Bookmark

View Full Paper