What question did this study set out to answer?

This study aims to evaluate the effectiveness of AI-assisted tools in code refactoring, focusing on ChatGPT, Gemini, and Codeium.

April 26, 2026Open Access

An empirical comparison of AI assisted software refactoring tools

Key Points

This study aims to evaluate the effectiveness of AI-assisted tools in code refactoring, focusing on ChatGPT, Gemini, and Codeium.
Detailed evaluation of ChatGPT, Gemini, and Codeium in code refactoring using both original and refactored code.
Utilization of SonarQube for performance analysis across different code datasets.
Assessment of the tools' ability to maintain functionality and correctness during refactoring.
For smaller datasets, Gemini scored 82% and Codeium 83%, while ChatGPT scored 59%.
In larger codebases, ChatGPT improved to 77.2%, surpassing Codeium in several refactoring attributes.
Gemini is identified as the best tool for quality attributes of refactoring, despite limitations in all tools.

Abstract

Recent advancements in Artificial Intelligence (AI) have led to the development of powerful assistant tools for coding, such as ChatGPT, Gemini, and Codeium. Existing research has explored the application of ChatGPT in different areas of software engineering, including code summarization, new code generation, and error detection, whereas its potential in code refactoring has been sparsely examined. Investigating the ability of ChatGPT to improve code refactoring could streamline the processes and increase productivity of developers. The literature also reveals a significant gap regarding specific tools like Codeium and Gemini for code refactoring assistance. This study has conducted a detailed evaluation of ChatGPT, Gemini, and Codeium from a code refactoring point of view while focusing on maintaining functionality and ensuring correctness. To generate the reports for ChatGPT, Gemini, and Codeium, both original and refactored code, the tool SonarQube is utilized. The three AI models exhibited distinct performance trends across datasets. For smaller code dataset, Gemini and Codeium attain higher average scores such as 82% and 83%, respectively as compared to ChatGPT having 59%, while for larger codebases dataset, ChatGPT improved substantially to 77.2%, surpassing Codeium in several refactoring attributes. The results showed that these AI-supported tools still possess certain limitations, but Gemini stands as so far best tool for meeting the most quality attributes of refactoring, ChatGPT & Codeium outcomes vary from short to lengthy code. This study helps software developers to use suitable AI-supported code refactoring tools as per their requirements. It also helps policy makers to design policy guidelines to improve software code refactoring through AI-supported tools.

Perguntar à IA

Bookmark

View Full Paper