What question did this study set out to answer?

This study aims to evaluate the effectiveness of extreme learning machines (ELM) in predicting GitHub repository popularity using static software metrics.

May 16, 2026Open Access

On the Use of an Extreme Learning Machine for GitHub Repository Popularity Prediction Based on Static Software Metrics

Key Points

This study aims to evaluate the effectiveness of extreme learning machines (ELM) in predicting GitHub repository popularity using static software metrics.
Used static software metrics and GitHub statistics for popularity prediction.
Developed an automated tool for data collection via GitHub API and SourceMonitor CLI.
Compared ELM against baseline machine learning models including LR, SVM, RF, and LSBoost.
ELM achieved the best R2 scores in four datasets, outperforming both RF and LR in specific scenarios.
Results indicate that static software metrics can effectively enhance repo popularity predictions.
ELM shows competitive performance compared to traditional machine learning models.

Abstract

Software data is widely used to predict attributes of software systems; however, obtaining reliable datasets from commercial companies remains challenging due to confidentiality constraints. GitHub has emerged as a data source, offering access to diverse applications and development statistics. Nevertheless, concerns about the reliability and representativeness of public repositories persist. Star count is a widely accepted indicator of repository popularity, and existing studies mainly rely on time-dependent platform metrics. In this study, we propose using static software metrics extracted from source code, along with GitHub statistics. To our knowledge, this study is among the first to use ELM for popularity prediction with static software metrics. Repositories from different application domains are selected to ensure dataset diversity and representativeness. An automated tool has been developed to collect data via the GitHub API and SourceMonitor CLI. In addition, several baseline machine learning models, including LR, SVM, RF, and LSBoost, are evaluated for comparison. Experimental results show that ELM achieves competitive performance across datasets. In terms of R2 scores, ELM performs best in four datasets, RF in three, and LR in one. These results indicate that ELM is an effective method for popularity prediction and highlight the potential of incorporating static software metrics into GitHub-based predictive modeling.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Borandağ et al. (Thu,) studied this question.

synapsesocial.com/papers/6a080985a487c87a6a40b66f https://doi.org/https://doi.org/10.3390/electronics15102095

Bookmark

View Full Paper