The exponential growth of English Learner (EL) research creates challenges for systematic literature synthesis, as traditional Boolean search methods struggle with the field's interdisciplinary nature and evolving terminology. This study introduces a dual-model framework that combines rule-based categorization with semantic clustering to organize EL research literature. Through analysis of 2000 empirical articles collected via systematic searches, the framework demonstrated better organizational capabilities compared to traditional approaches. A two-phase methodology revealed the emergence of COVID-19-related remote instruction research as the largest category (16.3 %), validating the framework's adaptability to emerging themes. Semantic clustering using Sentence-BERT identified five major research domains with moderate alignment to expert categories (19–32 %), reflecting the interdisciplinary nature of EL research. Boolean search strategies derived from the analysis achieved 63.2 % average precision with improved recall compared to baseline approaches. The resulting five-domain organizational framework provides practical tools for literature synthesis, with domains ranging from 211 to 536 articles each. The framework successfully balances methodological rigor with practical utility, offering immediate value for graduate students, systematic reviewers, and established researchers conducting literature synthesis in TESOL. This work demonstrates that hybrid methodologies combining human expertise with computational analysis can enhance literature organization while maintaining interpretability and domain relevance. • A dual-model framework combining rule-based categorization and semantic clustering organizes 2000 EL research articles. • COVID-19 remote instruction emerged as the largest research category (16.3 %), validating framework adaptability. • Semantic clustering using Sentence-BERT identified five major research domains with moderate expert alignment. • Validated Boolean search strategies achieved 63.2 % average precision for literature discovery. • Complete code, classification rules, and cluster outputs enable full replication and adaptation.
Nirmal Ghimire (Tue,) studied this question.