What question did this study set out to answer?

This research investigates how imputation methods for missing data impact fairness in machine learning predictions.

May 15, 2026Open Access

Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

Key Points

This research investigates how imputation methods for missing data impact fairness in machine learning predictions.
Analyzed 13 benchmark datasets with high missing rates (10%, 20%, 40%, 60%) under MAR/MNAR mechanisms.
Evaluated 6 state-of-the-art imputation strategies and their effect on fairness metrics.
Utilized decision diagrams to guide optimal imputation method selection for fairness objectives.
Imputation strategy and missing data mechanism significantly affect fairness metrics.
Statistical parity and predictive equality showed varying levels of bias based on the imputation method used.
Different classifiers yielded distinct fairness outcomes depending on the imputation technique applied.

Abstract

• Impact of missing data imputation on fairness-aware machine learning analyzed; • First study on MAR/MNAR mechanisms in high-missing-rate fairness contexts; • Autoencoders’ role in fairness gaps in missing data imputation explored; • Decision diagram aids in choosing optimal imputation methods for specific objectives. Missing data is a common problem in real-world datasets and can be characterized as the lack of information on one or multiple variables in a dataset. The most frequent technique for handling this issue is imputation, which consists in the replacement of the missing values according to a predefined criterion. Since missing values are often imputed based on the known values in the dataset, existing data issues can be propagated during the imputation process. One such issue is fairness, a concept integral to responsible Artificial Intelligence practices. This work investigates the impact of the imputation process on system fairness by examining how imputation affects the fairness of predictions in Machine Learning models. It provides a comprehensive analysis covering thirteen unfair benchmark datasets with six state-of-the-art imputation strategies under synthetic Missing Not At Random and Missing At Random mechanisms in a multivariate scenario with 10%, 20%, 40%, and 60% of missing rates. Fairness was measured by the following metrics: Statistical Parity, Equalized Odds, Equality of Opportunity, Predictive Equality, Equality of Positive, and Negative Predicted Values. The results demonstrate that the missing mechanism, the classifier choice, and the imputation strategy decisively influence the fairness of the predictions obtained by the Machine Learning models.

Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

Key Points

Abstract

Cite This Study