Software fault localization aims to identify faulty elements in a program by analyzing program information and the execution data of test cases. This process plays a crucial role in improving development efficiency, reducing debugging costs, and ensuring reliable software operation. However, in practical scenarios, due to the large scale of programs and the relatively small proportion of faulty statements, the number of failing test cases in a test suite is lower than that of passing test cases to some extent, leading to an imbalance in defect knowledge. Additionally, mutual interference and masking among multiple faults in a program can introduce characterizing noise into the test suite, further increasing the difficulty of fault localization. To address these limitations, we propose a novel method called SA-BCL, which aims to tackle the issues of defect knowledge imbalance and characterizing noise. SA-BCL comprises four key components: the data processing component, the defect knowledge balancing component, the confident learning denoising component, and the program spectrum reducing component. Specifically, the data processing component constructs the program spectrum by analyzing program behaviors and test results during the execution of test cases. The defect knowledge balancing component mitigates the imbalance of defect knowledge by employing boundary identification and oversampling techniques to augment the spectrum corresponding to failing test cases located at the boundary of the test suite. The confident learning denoising component leverages confident learning techniques to identify and eliminate characterizing noise in the program spectrum. The program spectrum reducing component iteratively computes the suspiciousness scores of program elements and reduces the spectrum to facilitate fault localization. We conduct a series of experiments on both the synthetic multi-fault dataset and the real multi-fault dataset, comparing SA-BCL with baseline approaches. The experimental results clearly demonstrate that SA-BCL outperforms the state-of-the-art methods, achieving improvements of up to 37.5% in average wasted effort, 29.2% in precision, and 22.5% in recall. In addition, SA-BCL maintains comparable time costs within the same order of magnitude as baseline methods, and most of the improvements are statistically significant, demonstrating its robustness.
Du et al. (Thu,) studied this question.