What question did this study set out to answer?

This research aims to enhance spectrum-based fault localization techniques by incorporating code naturalness with large language models.

May 6, 2026Open Access

Software Fault Localization Approach with Coverage Matrix Optimization Boosted by LLM-Based Code Naturalness

Puntos clave

This research aims to enhance spectrum-based fault localization techniques by incorporating code naturalness with large language models.
Developed code-naturalness-based fault localization approach (CNFL)
Evaluated naturalness using large language models
Conducted comprehensive experiments on Defects4J dataset
Compared with five conventional SBFL formulas using Jaccard and others
CNFL outperforms conventional SBFL techniques
Achieved up to 60.8% improvement in Top-1 fault localization hit rate
Consistently exceeded results of standalone LLM methods
Showed significant enhancements compared to traditional optimization approaches

Resumen

Spectrum-based fault localization (SBFL), one of the typical types of software fault localization techniques, has been widely adopted to assist developers in identifying faulty program elements. However, conventional SBFL techniques rely solely on test coverage statistics and overlook intrinsic characteristics of the source code itself. To fill this gap, this study proposes an enhanced SBFL approach, code-naturalness-based fault localization (CNFL), which incorporates code naturalness evaluated by a large language model (LLM) in the pipeline of SBFL. By weighting program statements according to their naturalness scores, CNFL prioritizes statements that deviate from typical coding patterns and therefore optimizes the coverage matrix for effective fault localization. Comprehensive experiments are conducted on the Defects4J dataset with five representative SBFL formulas and five LLMs for naturalness evaluation. The results demonstrate that CNFL significantly outperforms conventional SBFL techniques. Specifically, it boosts the Top-1 fault localization hit rate by up to 60.8% and 56.8% when applied to classic SBFL formulas like Jaccard and Ochiai, respectively. Moreover, CNFL is further confirmed to consistently surpass both standalone LLM methods and representative fault localization approaches that primarily optimize the coverage matrix.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo