Key points are not available for this paper at this time.
Success in supervised learning is constrained by availability of an adequate labeled data sample for training. The problem of a complete labeling of every data of the training dataset can be alleviated allowing semi-complete labeling in a way so called semi-supervised learning. In this paper, we investigate the performance of semi-supervised learning in imbalanced classification problems. Augmentation of the class of limited data is applied for lowering the variance of the estimate using a data subrogation method. We analyze the effect of this data augmentation in several simulated and experimental scenarios of a challenging application: automatic credit card fraud detection. The relationships among different semi-supervision and sample augmentation ratios in this application are discussed in terms of receiver operating characteristic curves and business key performance indicators.
Salazar et al. (Sun,) studied this question.