Is the cure worse than the disease? overfitting in automated program repair

Key Points

Key points are not available for this paper at this time.

Abstract

Automated program repair has shown promise for reducing the sig-nificant manual effort debugging requires. This paper addresses a deficit of earlier evaluations of automated repair techniques caused by repairing programs and evaluating generated patches ’ correctness using the same set of tests. Since tests are an imperfect metric of program correctness, evaluations of this type do not discriminate be-tween correct patches and patches that overfit the available tests and break untested but desired functionality. This paper evaluates two well-studied repair tools, GenProg and TrpAutoRepair, on a pub-licly available benchmark of 998 bugs, each with a human-written patch. By evaluating patches using tests independent from those used during repair, we find that the tools are unlikely to improve the proportion of independent tests passed, and that the quality of the patches is proportional to the coverage of the test suite used during repair. For programs that pass most tests, the tools are as likely to break tests as to fix them. However, novice developers also overfit, and automated repair performs no worse than these develop-ers. In addition to overfitting, we measure the effects of test suite coverage, test suite provenance, and starting program quality, as well as the difference in quality between novice-developer-written and tool-generated patches when quality is assessed with a test suite independent from the one used for patch generation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Edward K. Smith

Institute for Breathing and Sleep

Earl T. Barr

University College London

Claire Le Goues

Software Engineering Institute

Actions

Institutions

University College London

Carnegie Mellon University

University of Massachusetts Amherst

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Smith et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0ead71a14f152feaf9b1bf — DOI: https://doi.org/10.1145/2786805.2786825

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs· 2008 · 2,681 citations
Countering Network Worms Through Automatic Patch Generation· 2005 · 135 citations
Genetic Programming: On the Programming of Computers by Means of Natural Selection· 1992 · 13,277 citations
DirectFix: looking for simple program repairs· 2015 · 169 citations
Automated Fixing of Programs with Contracts· 2014 · 147 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs· 2008 · 2,681 citations
Countering Network Worms Through Automatic Patch Generation· 2005 · 135 citations
Genetic Programming: On the Programming of Computers by Means of Natural Selection· 1992 · 13,277 citations
DirectFix: looking for simple program repairs· 2015 · 169 citations
Automated Fixing of Programs with Contracts· 2014 · 147 citations

Is the cure worse than the disease? overfitting in automated program repair

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider