Abstract Unbalanced classification poses a significant challenge in real-world applications such as medical diagnosis and fraud detection, where class label distributions are highly skewed. Standard classification methods often yield suboptimal results by sacrificing minority class accuracy to maximize overall performance. Common techniques to address unbalanced classification typically involve weighting adjustments of points for different classes or resampling strategies, including oversampling of minority classes and undersampling of majority classes. While these balancing techniques are effective in improving classification accuracy, they often introduce biases that compromise posterior class probability estimation and model calibration. This paper highlights the trade-offs associated with balancing techniques when applied without appropriate adjustments. We systematically investigate the distortions in probability estimation caused by these unbalanced classification techniques and propose a robust framework to correct these biases through probability adjustment. We further investigate high-dimensional unbalanced data, examining both the distortion induced by balancing techniques and the shrinkage effect of regularization on probability estimation. Finally, our method is evaluated through simulations and real data analysis, covering a range of balancing strategies including class weighting, random resampling, and generative models. Results demonstrate that our proposed framework significantly mitigates probability estimation bias while preserving classification power when applying traditional balancing techniques, ensuring reliable posterior estimates.
Lin et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: