Key points are not available for this paper at this time.
This article presents a data-driven review of resampling approaches aimed at mitigating the class imbalance problem in machine learning, a widespread issue that limits classifier performance across numerous sectors. Initially, this research provides an extensive theoretical examination of the class imbalance problem, emphasizing its propensity to amplify existing data difficulty factors, including class overlap, small disjuncts, and noise, thus biasing the model towards the majority class. Acknowledging the significance of detecting and quantifying the synergistic effects between class imbalance and these data difficulty factors, this study surveys metrics formulated to quantify such phenomena in imbalanced domains. Subsequently, an exhaustive review of recent oversampling, undersampling, and hybrid sampling approaches is conducted. A major finding arising from this review is the discernible shift in resampling approaches towards enhanced adaptability. This is achieved through the identification of problematic regions and the subsequent implementation of customized resampling protocols. Concurrently, a methodological divergence is observed in both oversampling and undersampling strategies: certain oversampling methods target regions of higher classification complexity, which are crucial for effective model training, while others focus on areas of lower classification complexity to safely oversample the minority class. In contrast, undersampling approaches either predominantly remove majority samples from redundant regions or focus on class boundaries to reduce class overlap. However, despite this increased adaptability, no resampling method consistently demonstrated superior performance across all documented experiments. Consequently, we explore a promising strategy, namely the adoption of recommendation systems for resampling approaches. Lastly, the primary research challenges within this topic are discussed.
Carvalho et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: