Software data is widely used to predict attributes of software systems; however, obtaining reliable datasets from commercial companies remains challenging due to confidentiality constraints. GitHub has emerged as a data source, offering access to diverse applications and development statistics. Nevertheless, concerns about the reliability and representativeness of public repositories persist. Star count is a widely accepted indicator of repository popularity, and existing studies mainly rely on time-dependent platform metrics. In this study, we propose using static software metrics extracted from source code, along with GitHub statistics. To our knowledge, this study is among the first to use ELM for popularity prediction with static software metrics. Repositories from different application domains are selected to ensure dataset diversity and representativeness. An automated tool has been developed to collect data via the GitHub API and SourceMonitor CLI. In addition, several baseline machine learning models, including LR, SVM, RF, and LSBoost, are evaluated for comparison. Experimental results show that ELM achieves competitive performance across datasets. In terms of R2 scores, ELM performs best in four datasets, RF in three, and LR in one. These results indicate that ELM is an effective method for popularity prediction and highlight the potential of incorporating static software metrics into GitHub-based predictive modeling.
Borandağ et al. (Thu,) studied this question.