What question did this study set out to answer?

The study aims to investigate biases in probability estimation caused by unbalanced classification techniques and propose a correction framework.

April 22, 2026Open Access

On Probability Estimation for Unbalanced Classification

Key Points

The study aims to investigate biases in probability estimation caused by unbalanced classification techniques and propose a correction framework.
Systematic investigation of probability estimation distortions in unbalanced classification.
Evaluation of various balancing strategies through simulations and real data analysis.
Analysis of high-dimensional unbalanced data considering regularization effects.
The proposed framework significantly reduces probability estimation bias.
Classification power is preserved while using traditional balancing techniques.
Improved posterior estimates are achieved across multiple data scenarios.

Abstract

Abstract Unbalanced classification poses a significant challenge in real-world applications such as medical diagnosis and fraud detection, where class label distributions are highly skewed. Standard classification methods often yield suboptimal results by sacrificing minority class accuracy to maximize overall performance. Common techniques to address unbalanced classification typically involve weighting adjustments of points for different classes or resampling strategies, including oversampling of minority classes and undersampling of majority classes. While these balancing techniques are effective in improving classification accuracy, they often introduce biases that compromise posterior class probability estimation and model calibration. This paper highlights the trade-offs associated with balancing techniques when applied without appropriate adjustments. We systematically investigate the distortions in probability estimation caused by these unbalanced classification techniques and propose a robust framework to correct these biases through probability adjustment. We further investigate high-dimensional unbalanced data, examining both the distortion induced by balancing techniques and the shrinkage effect of regularization on probability estimation. Finally, our method is evaluated through simulations and real data analysis, covering a range of balancing strategies including class weighting, random resampling, and generative models. Results demonstrate that our proposed framework significantly mitigates probability estimation bias while preserving classification power when applying traditional balancing techniques, ensuring reliable posterior estimates.

Bookmark

View Full Paper