• Impact of missing data imputation on fairness-aware machine learning analyzed; • First study on MAR/MNAR mechanisms in high-missing-rate fairness contexts; • Autoencoders’ role in fairness gaps in missing data imputation explored; • Decision diagram aids in choosing optimal imputation methods for specific objectives. Missing data is a common problem in real-world datasets and can be characterized as the lack of information on one or multiple variables in a dataset. The most frequent technique for handling this issue is imputation, which consists in the replacement of the missing values according to a predefined criterion. Since missing values are often imputed based on the known values in the dataset, existing data issues can be propagated during the imputation process. One such issue is fairness, a concept integral to responsible Artificial Intelligence practices. This work investigates the impact of the imputation process on system fairness by examining how imputation affects the fairness of predictions in Machine Learning models. It provides a comprehensive analysis covering thirteen unfair benchmark datasets with six state-of-the-art imputation strategies under synthetic Missing Not At Random and Missing At Random mechanisms in a multivariate scenario with 10%, 20%, 40%, and 60% of missing rates. Fairness was measured by the following metrics: Statistical Parity, Equalized Odds, Equality of Opportunity, Predictive Equality, Equality of Positive, and Negative Predicted Values. The results demonstrate that the missing mechanism, the classifier choice, and the imputation strategy decisively influence the fairness of the predictions obtained by the Machine Learning models.
Mangussi et al. (Fri,) studied this question.