What question did this study set out to answer?

The study aims to identify whether datasets or algorithms primarily affect fairness in healthcare machine learning, particularly across sex groups.

May 15, 2026Open Access

Dataset effects outweigh algorithmic effects in determining fairness of healthcare machine learning

Key Points

The study aims to identify whether datasets or algorithms primarily affect fairness in healthcare machine learning, particularly across sex groups.
Conducted a systematic fairness evaluation using three healthcare domains and ten classifiers.
Applied three controlled sex-ratio sampling scenarios: 50/50, 90/10, and 10/90.
Utilized mixed-effects interaction modeling to analyze bias and variance contribution.
Dataset identity accounted for 63.4% of variability in gender accuracy gaps.
Algorithm choice contributed only 9.7%, indicating limited impact on fairness.
Balanced sampling reduced but did not eliminate disparities, suggesting residual biases from features.

Abstract

Ensuring fairness in clinical machine learning is a major concern, yet the dominant driver of unequal performance across sex groups remains unclear: is it the dataset or the algorithm. We conducted a systematic fairness evaluation across three healthcare domains—wearable physiology (MHEALTH), cardiac risk prediction (UCI Heart Disease), and stroke assessment—using ten widely used classifiers and three controlled sex-ratio sampling scenarios (50/50, 90/10, 10/90) under an identical analytical pipeline. Gender accuracy gaps varied markedly across datasets and exhibited dataset-specific patterns that did not generalize across clinical domains. Mixed-effects interaction modelling showed that the same algorithm could display negligible bias in one dataset and substantial bias in another. Variance contribution decomposition of the absolute Gender Accuracy Gap (∣GAG∣) indicated that dataset identity accounted for most of the observed variability (63.4%), with additional contribution from dataset–algorithm interactions (17.2%); algorithm choice alone explained 9.7%, whereas sampling scenario contributed negligibly (0.2%). Balanced sampling reduced disparities but did not eliminate them, consistent with residual sex-associated signal/feature structure beyond representation imbalance. These findings demonstrate that fairness in healthcare machine learning is primarily dataset-dependent, motivating dataset- and context-specific auditing before clinical deployment.

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Elgendi et al. (Wed,) studied this question.

synapsesocial.com/papers/6a06b971e7dec685947ac112 https://doi.org/https://doi.org/10.1038/s41746-026-02723-1

AI से पूछें

Bookmark

View Full Paper