Los puntos clave no están disponibles para este artículo en este momento.
Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despite its popularity and usage. Recent research by Codabux et al. 21 finds that R packages can have 10 diverse TD types analyzing peer-review documentation. However, the findings are based on the manual analysis of a small sample of R package review comments. In this paper, we develop a suite of Machine Learning (ML) classifiers to detect the 10 TDs automatically. The best performing classifier is based on the deep ML model BERT, which achieves F1-scores of 0. 71 - 0. 91. We then apply the trained BERT models on all available peer-review issue comments from two platforms, rOpenSci and BioConductor (13. 5K review comments coming from a total of 1297 R packages). We conduct an empirical study on the prevalence and evolution of 10 TDs in the two R platforms. We discovered documentation debt is the most prevalent among all types of TD, and it is also expanding rapidly. We also find that R packages of generic platform (i. e. rOpenSci) are more prone to TD compared to domain-specific platform (i. e. BioConductor). Our empirical study findings can guide future improvements opportunities in R package documentation. Our ML models can be used to automatically monitor the prevalence and evolution of TDs in R package documentation.
Khan et al. (Tue,) studied this question.