What question did this study set out to answer?

This research aims to evaluate ChatGPT's effectiveness in automating code refactoring tasks.

April 3, 2026Open Access

Refactoring Object‐Oriented Software With ChatGPT: An Empirical Study

Key Points

This research aims to evaluate ChatGPT's effectiveness in automating code refactoring tasks.
Identified five common refactoring scenarios.
Developed 200 test cases derived from five Java open-source applications.
Applied test cases using ChatGPT and NetBeans for comparison.
Utilized prompt engineering strategies to formulate effective prompts for ChatGPT.
ChatGPT successfully compiled refactored code for 88% of the test cases.
ChatGPT achieved a correctness rate of 29% in intended refactoring tasks.
NetBeans outperformed ChatGPT with a correctness rate of 67%.
Findings highlight the need for human oversight to improve correctness in ChatGPT's outputs.

Abstract

Generative AI (GenAI) is currently being utilized in many tasks to improve their quality. Because GenAI tools are highly qualified in text‐based applications, they have the potential to automate tasks across the software engineering lifecycle. In this study, we empirically investigate ChatGPT’s ability to perform code refactoring tasks. Considering five widely employed refactoring scenarios, we propose testing scenarios upon which we derive 200 test cases for five Java open‐source applications. These test cases are applied using ChatGPT and NetBeans, and the results are compared and evaluated. To enable ChatGPT to perform refactoring, we follow prompt engineering approaches to design effective prompts. The results show that ChatGPT has generated refactored pieces of code, which have been successfully compiled for 88% of the test cases. However, it has correctly performed the intended refactoring for only 29% of the cases, compared to NetBeans, which has achieved a 67% correctness rate. These findings indicate that although ChatGPT has some potential to contribute to code refactoring tasks, it is not yet ready to be used as a fully automated refactoring tool for large‐scale real‐world applications. Its outputs still require human oversight to improve the refactored code’s correctness.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper