What question did this study set out to answer?

The aim is to investigate data pipelines as active agents in knowledge production within AI systems.

April 27, 2026Open Access

Data Pipelines as Epistemic Agents: Toward Data Excellence, Fairness, and Accountability in AI

MZMeriem ZizouaneUniversity of Hassan II Casablanca

Key Points

The aim is to investigate data pipelines as active agents in knowledge production within AI systems.
The study draws on philosophy of science and empirical machine learning perspectives.
It analyzes factors such as bias, documentation, environmental impacts, and governance.
A framework for responsible and accountable data pipelines is proposed.
Data pipeline quality significantly influences AI model outcomes and behaviors.
Addressing biases and enhancing documentation practices improves transparency and fairness in AI.
Stronger governance frameworks promote sustainability and accountability in AI systems.

Abstract

This paper explores the often-overlooked role of data pipelines in artificial intelligence systems, arguing that they are not merely technical infrastructures but active epistemic agents that shape knowledge production. It examines how decisions made during data collection, cleaning, validation, and transformation influence what information reaches machine learning models and therefore affect model behavior and outcomes. Drawing on perspectives from the philosophy of science and empirical machine learning practice, the study analyzes key factors such as bias, documentation practices, environmental impact, and organizational governance. It then proposes a framework for responsible data pipelines based on data quality, transparency, fairness, sustainability, and accountability. The paper highlights the need to treat data pipelines as central components of AI epistemology and calls for stronger governance and interdisciplinary oversight to ensure trustworthy and responsible AI systems.

Perguntar à IA

Bookmark

View Full Paper