Software refactoring improves the maintainability of code and reduces technical debt, but making the construction of a labeled refactoring dataset is a costly and labor-intensive process. To make refactoring prediction more deployable under limited annotation budgets, this paper introduces a Deep Active Learning (DAL) pipeline that iteratively trains a deep neural classifier on software-metric representations and selectively queries labels for the most informative unlabeled entities. Our proposed approach is evaluated in a pool-based setting across class-, method-, and variable-level refactoring datasets (multiple refactoring types) using a consistent training protocol and a broad set of query strategies. Results show that DAL can recover near full-data effectiveness with substantially fewer labels: on average, reaching the target performance requires 11.4% labeled data for class-level, 25.0% for method-level, and 20.0% for variable-level refactorings—corresponding to roughly 75–89% labeling savings, demonstrating improved data efficiency for refactoring prediction. Moreover, uncertainty-based and dropout-enhanced strategies were the most consistently effective query strategies across refactoring types and labeling budgets.
Alameer et al. (Mon,) studied this question.