What does this research mean for the field?

A dual-LLM framework utilizing auxiliary repair-relevant information significantly improves function-level automated program repair, correctly fixing substantially more bugs than existing techniques without requiring costly statement-level fault localization. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study investigates the effectiveness of large language models in function-level automated program repair, particularly exploring few-shot learning and auxiliary information.

May 20, 2026

Practical LLM-Based Function-Level Automated Program Repair: How Far Are We?

Key Points

This study investigates the effectiveness of large language models in function-level automated program repair, particularly exploring few-shot learning and auxiliary information.
Conducted a comprehensive analysis of six large language models for function-level APR.
Constructed a benchmark using the Defects4J 1.2 and 2.0 datasets.
Evaluated a proposed dual-LLM framework named SRepair for enhanced repair performance.
SRepair correctly fixes 227 single-function bugs in Defects4J, outperforming prior techniques by at least 26%.
Achieved successful fixes for 21 multi-function bugs, demonstrating superior performance over state-of-the-art methods.

Abstract

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world applications due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct a comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark on both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance and even outperforms multiple recent APR techniques. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair , which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 227 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 26%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 21 multi-function bugs in the Defects4J dataset, significantly outperforming other state-of-the-art APR techniques.

Bookmark

Cite This Study

Xiang et al. (Mon,) studied this question.

synapsesocial.com/papers/6a0d50f3f03e14405aa9d29f https://doi.org/https://doi.org/10.1145/3812804

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark