Welding defect classification is essential for maintaining the integrity of oil and gas infrastructure, yet it is significantly hindered by severe class imbalance. This study introduces a framework that integrates Random Undersampling and threshold optimization to improve the detection performance of imbalanced datasets. The approach first applies Random Undersampling to reduce majority class samples and rebalance the training set. It then performs post training threshold optimization using multiple evaluation metrics, both with and without a constraint that requires the true positive rate to be greater than or equal to the true negative rate. Across original and resampled datasets, evaluations of various threshold selection strategies, including the default threshold, the class prior threshold, and metric based thresholds, show improved accuracy and a better balance between sensitivity and specificity. The proposed framework increases defect detection while reducing false positives, offering practical guidance for handling other imbalanced binary classification tasks.
Han et al. (Mon,) studied this question.