Spectrum-based fault localization (SBFL), one of the typical types of software fault localization techniques, has been widely adopted to assist developers in identifying faulty program elements. However, conventional SBFL techniques rely solely on test coverage statistics and overlook intrinsic characteristics of the source code itself. To fill this gap, this study proposes an enhanced SBFL approach, code-naturalness-based fault localization (CNFL), which incorporates code naturalness evaluated by a large language model (LLM) in the pipeline of SBFL. By weighting program statements according to their naturalness scores, CNFL prioritizes statements that deviate from typical coding patterns and therefore optimizes the coverage matrix for effective fault localization. Comprehensive experiments are conducted on the Defects4J dataset with five representative SBFL formulas and five LLMs for naturalness evaluation. The results demonstrate that CNFL significantly outperforms conventional SBFL techniques. Specifically, it boosts the Top-1 fault localization hit rate by up to 60.8% and 56.8% when applied to classic SBFL formulas like Jaccard and Ochiai, respectively. Moreover, CNFL is further confirmed to consistently surpass both standalone LLM methods and representative fault localization approaches that primarily optimize the coverage matrix.
Yao et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: