What type of study is this?

This is a Quantitative Study study.

October 20, 2025

Do Current Language Models Support Code Intelligence for R Programming Language? RCR Report

Key Points

Current pre-trained language models show varying performance on R programming tasks, particularly in code summarization.
The dataset used was created from R repositories on GitHub, incorporating Roxygen2 documentation for matching code with descriptions.
Challenges arise from R's dual paradigms—Tidyverse and Base R—affecting model performance in code tasks.
Effective utilization of Code-PLMs for R is complicated by the language's diverse styles and features.

Abstract

In this report, we introduce the dataset curated to replicate and extend experiments on R programming tasks, particularly code summarization and method name prediction. The dataset was generated by collecting R repositories from GitHub, parsing the code snippets using the tree-sitter parser, and matching them with natural language descriptions based on Roxygen2 documentation. Building on this dataset, our work conducts an in-depth analysis of the performance of Pre-trained Language Models for Code (Code-PLMs) on R code. We highlight the challenges posed by R’s dual paradigms—Tidyverse and Base R—and demonstrate that current models, including Large Language Models, exhibit varying degrees of performance degradation when applied to R code. As a result, we underscore the complexity of effectively leveraging Code-PLMs for R, given its diverse programming styles and language features.

AI से पूछें

Bookmark

Cite This Study

Zhao et al. (Tue,) studied this question.

synapsesocial.com/papers/68f6196ee0bbbc94fac3630a https://doi.org/https://doi.org/10.1145/3744902

AI से पूछें

Bookmark