What question did this study set out to answer?

This study investigates how bias propagates through the data science process and evaluates various mitigation strategies.

May 10, 2026Open Access

View Full Paper

Bias Detection and Mitigation Techniques in Data Science Pipelines: An Empirical Evaluation

DDDeshinta Arrova Dewi UOUgochi Okengwu ZRZakka Ugih RizqiMaterial (Belgium)

Key Points

This study investigates how bias propagates through the data science process and evaluates various mitigation strategies.
Tested a framework on benchmark datasets with sensitive attributes.
Implemented three predictive models: Logistic Regression, Random Forest, and Gradient Boosting.
Used metrics like Demographic Parity, Equal Opportunity, and Average Odds to assess fairness.
Data-level bias mitigation reduced disparity by 28%.
Fairness-regularized optimization decreased disparity by 35%.
Hybrid strategies reduced demographic disparity by 40-45% with minimal accuracy loss.

Abstract

Background: Failure to consider algorithmic bias can result in discriminatory outcomes in machine learning systems, particularly when these models operate in high-stakes decision-making environments. Although numerous bias mitigation techniques have been proposed, most studies treat fairness assessment as a post hoc evaluation. This gap highlights the need for a lifecycle-oriented framework to examine interconnected bias and fairness mechanisms.Aims: This study aims to conduct an empirical investigation of bias propagation across the data science continuum within a structured bias-processing framework.Methods: The proposed framework was tested on benchmark datasets containing sensitive attributes. Three predictive models were implemented: Logistic Regression, Random Forest, and Gradient Boosting. Fairness was evaluated using Demographic Parity, Equal Opportunity, and Average Odds metrics. Predictive modeling techniques were further employed to interpret fairness outcomes. Bias mitigation strategies were applied at both data and model levels, including fairness-regularized optimization and hybrid approaches. Sensitivity analysis was conducted to examine the trade-off between fairness constraints and model loss.Result: The empirical findings indicate that most disparities originate from bias embedded in the data rather than from model architecture. Data-level bias mitigation reduced disparity by 28%. The fairness-regularized optimization approach reduced disparity by 35%. The hybrid mitigation strategy achieved a demographic disparity reduction of 40–45%, with an accuracy decrease of no more than 2%. Sensitivity analysis revealed non-linear tensions between fairness constraints and optimization loss, demonstrating that early-stage bias mitigation stabilizes fairness without significantly increasing performance trade-offs.Conclusion: This study extends both theoretical and practical understanding of lifecycle bias propagation in machine learning systems. The findings emphasize the importance of addressing bias at early stages of the data science pipeline to achieve stable and sustainable fairness outcomes. By integrating fairness engineering throughout the lifecycle, the proposed framework contributes to more robust and ethically aligned AI systems.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper

Cite This Study

Dewi et al. (Thu,) studied this question.

synapsesocial.com/papers/6a002087c8f74e3340f9b67e https://doi.org/https://doi.org/10.58723/ijaaiml.v3i1.655

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper