Code smells—subtle indicators of poor design choices—pose significant challenges to software maintainability and readability, particularly in dynamic languages such as Python. Traditional detection methods, including rule-based heuristics and static machine learning classifiers, often suffer from limited adaptability, poor contextual awareness, and lack of explainability. These limitations hinder their effectiveness in evolving codebases and real-world development environments. This study introduces a novel Agentic retrieval-augmented generation (Agentic RAG) framework for code smell detection, marking the first application of agentic reasoning in this domain. By embedding autonomous agents into the retrieval and reasoning pipeline, the proposed system dynamically routes queries, selects optimal retrieval strategies, and synthesizes context-aware explanations using large language models (LLMs). Unlike static classifiers, the proposed framework leverages hybrid retrieval (sparse + dense) and structured prompting to detect and explain Long Method and Large Class smells with high interpretability. Experimental results demonstrate that Agentic RAG—particularly when paired with DeepSeek and chain-of-thought prompting—achieves superior performance, with 89.5% accuracy, a macro F1-score of 78.3%, and a weighted F1 of 88.7%. To assess generalization, Experiment 2 extended the framework to 21 distinct code smell types across multiple programming languages, achieving 94.85% accuracy, a macro F1-score of 90.24%, and a weighted F1-score of 94.93% through stratified five-fold cross-validation, thereby confirming the model’s robustness and scalability. Beyond academic benchmarks, this work lays the foundation for real-world integration into developer platforms, enabling real-time code review, contextual feedback, and actionable refactoring suggestions. By bridging LLMs with dynamic retrieval and agentic reasoning, this framework advances the frontier of intelligent software quality assurance.
Aljohani et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: