What question did this study set out to answer?

The aim is to enhance the accuracy of software defect prediction using domain adaptation and feature fusion techniques.

March 28, 2026Open Access

Cross-Project Software Defect Prediction Based on Domain Adaptation and Feature Fusion

Key Points

The aim is to enhance the accuracy of software defect prediction using domain adaptation and feature fusion techniques.
Developed a domain adaptation and feature fusion-based prediction method (DAFF-CPDP).
Utilized the TCA+ algorithm for effective domain adaptation.
Implemented an encoder layer for progressive feature fusion.
Evaluated the model using multiple Java projects and compared it with various baseline models.
DAFF-CPDP outperformed traditional machine learning and deep learning models.
Class-balanced datasets positively influence prediction effectiveness.
Diminished distribution differences between source and target datasets enhance prediction accuracy.

Abstract

With the advancement of computer science, software has become increasingly prevalent across all facets of society, making software quality issues a focal point of industry concern. The scarcity of sufficient defect data in the early stages of projects undermines prediction accuracy, driving research into cross-project software defect prediction. The traditional manual measurement features face challenges due to the data distribution discrepancies between original and cross-project contexts, which hinder the prediction effectiveness. Furthermore, single features fail to comprehensively characterize software information. This paper proposes a domain adaptation and feature fusion-based cross-project software defect prediction method (DAFF-CPDP). The model employs the TCA+ algorithm for domain adaptation and utilizes an encoder layer for progressive feature fusion. Multiple Java projects were selected for evaluation. The comparisons with various baseline models demonstrated that the proposed model outperforms both the traditional machine learning-based feature models and the diverse deep learning-based single-feature or multi-feature models. Concurrently, this paper analyzes the impact of different source projects on target projects, confirming that class-balanced datasets and datasets with smaller distribution differences are more conducive to project prediction.

Cross-Project Software Defect Prediction Based on Domain Adaptation and Feature Fusion

Key Points

Abstract

Cite This Study