Abstract As machine learning (ML) systems become deeply embedded in contemporary decision-making processes, concerns regarding algorithmic bias have attracted increasing scholarly and societal attention. Automated models are now widely used in high-impact domains including recruitment, credit approval, education, and the criminal justice system, where discriminatory outcomes may arise, where biased outcomes may reinforce existing inequalities. Consequently, fairness has become a fundamental requirement rather than an optional design goal. Although many fairness-aware ML approaches rely on descriptive performance metrics such as accuracy, precision, recall, or selection rates, these measures alone are insufficient to determine whether observed disparities between demographic groups are statistically meaningful or merely due to random variation. This paper proposes a simple yet rigorous statistical hypothesis testing framework for detecting algorithmic bias by formally comparing model outcomes across protected groups. The framework employs classical statistical tools, including the two-proportion z-test and the chi-square test of independence, to evaluate group-level differences in decision outcomes. A small synthetic dataset is used to demonstrate the proposed methodology in a transparent and interpretable manner. The results illustrate that statistically significant bias can be detected even when overall model performance appears balanced. The study emphasizes the importance of incorporating uncertainty and statistical significance into fairness assessments
Vidya Bhatlavande (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: