1 AbstractAutomated Program Repair (APR) tools have demonstrated significant potentialin generating patches for software bugs, yet they suffer from the critical overfittingproblem where generated patches pass existing test cases without addressing theunderlying issues. Current patch correctness evaluation methods heavily depend onlabeled data from specific APR tools, severely limiting their generalizability andrequiring substantial manual verification effort. We present LLM4PatchCorrect, anovel language model-based system that automates patch correctness assessmentwithout requiring tool-specific training data or fine-tuning. Our approach employs an innovative explain-execute-suggest framework that provides comprehensive patchclassification, detailed explanations for incorrect patches, and actionable improvementsuggestions. Through extensive experimental evaluation on the QuixBugs dataset, LLM4PatchCorrect achieves 87.3% accuracy in patch classification, generates high-quality explanations rated 4.2/5 by experts, and provides successful patch suggestions in 70% of cases. The integrated parallel processing architecture reduces analysis timeby 50% compared to sequential processing, while the web-based interface ensurespractical accessibility for real-world deployment.
Gayathry T S, Rini T Paul, Ani Sunny (Wed,) studied this question.