Association measures, such as Pointwise Mutual Information (PMI), have been widely used in learner corpus research to explore learners' phraseological competence (Paquot Granger Paquot, 2019). PMI has been widely validated as a reliable indicator of L2 proficiency in both writing and speaking. Research has shown that lower-proficiency learners tend to rely on word-by-word composition, leading to weaker collocational strength, whereas more advanced learners produce stronger associations, reflecting an increasing level of phraseological competence (Bestgen, 2017; Bestgen Durrant Garner et al., 2019; 2020; Uchihara et al., 2020; Zhang et al., 2023). Despite its robustness, the effectiveness of PMI as a measure of collocational strength is contingent upon the reference corpora used (Gablasova et al., 2017; Paquot Bestgen Paquot et al., 2022), and they may also rely on genre-specific corpora that raise significant questions about comparability across different contexts (Durrant Paquot, 2019). This complexity emphasizes the critical need for carefully selected reference corpora that authentically represent the contexts of learner output. Thus, understanding how reference corpus selection affects PMI calculations is pivotal for accurately assessing phraseological competence. In light of these considerations, this study investigates how the characteristics of reference corpora affect the calculation of PMI scores for word combinations in EFL learners' oral production. The research is guided by two key questions: 1) How does the register of the reference corpus influence PMI-based measures of collocational strength in EFL learners’ oral production? 2) How do differences in PMI scores resulting from reference corpora representing different registers impact the assessment of phraseological competence in EFL learners?? Utilizing a corpus of oral performances from 90 test-takers (CEFR levels A2-C2) on a TEM 8 Oral Test commentary task, this study calculated PMI scores for word combinations in direct objects (dobj) grammatical relation. The reference corpora—COCA Academic, COCA Newspaper, and COCA Spoken—were chosen to align with different communicative purposes relevant to learners’ language use. Given that the dataset consists of university students’ oral commentaries on social issues, these corpora capture key register features: the spoken mode (COCA Spoken), argumentative and expository discourse in public communication (COCA Newspaper), and formal academic discussion (COCA Academic). This selection ensures a nuanced analysis of how phraseological competence varies across registers and communicative contexts. Preliminary findings reveal that assessments of collocation use vary depending on the reference corpus selected, revealing how corpus choice influences the evaluation of phraseological competence. More importantly, PMI scores calculated from the three different reference corpora reveal distinct aspects of phraseological competence in L2 oral production, particularly by emphasizing different types of phraseological units (e.g., general vs. academic collocations) and register-specific preferences. The findings underscore the critical necessity of meticulously selecting reference corpora that authentically represent the context of learners' output in order to accurately and effectively assess phraseological competence.
Wang et al. (Wed,) studied this question.