Algorithms are increasingly used to make or support decisions, such as recruitment and selection, loan approval, fraud detection, classification and prediction. This has raised important questions on how to assure the use of algorithms respects important values such as fairness, absence of bias and discrimination, and also explainability and redress. Despite decades of research, there is no single metric that can be used to measure fairness, absence of bias or discrimination. For every implementation one needs to evaluate multiple different metrics and also additional decisions about the subgroups and thresholds to use for the fairness metrics. In this paper we present an overview of the most important fairness metrics, provide, updated more thorough definitions, and a new dataset of hiring candidates (n=225) to show the impact of the different definitions in practice. The dataset presented is suitable for educational and demonstration purposes since it is freely usable, has no privacy constraints, no data quality issues and is sufficiently complex to explain the challenges of fairness evaluations. We use the dataset to show that the different fairness definitions are indeed conflicting and that it is necessary to consider many sub-groups using multiple sensitive features. We also discuss the importance of the socio-technical context and domain knowledge in fairness.
Burda et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: