This paper explores the often-overlooked role of data pipelines in artificial intelligence systems, arguing that they are not merely technical infrastructures but active epistemic agents that shape knowledge production. It examines how decisions made during data collection, cleaning, validation, and transformation influence what information reaches machine learning models and therefore affect model behavior and outcomes. Drawing on perspectives from the philosophy of science and empirical machine learning practice, the study analyzes key factors such as bias, documentation practices, environmental impact, and organizational governance. It then proposes a framework for responsible data pipelines based on data quality, transparency, fairness, sustainability, and accountability. The paper highlights the need to treat data pipelines as central components of AI epistemology and calls for stronger governance and interdisciplinary oversight to ensure trustworthy and responsible AI systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Meriem Zizouane
University of Hassan II Casablanca
University of Hassan II Casablanca
Building similarity graph...
Analyzing shared references across papers
Loading...
Meriem Zizouane (Sat,) studied this question.
synapsesocial.com/papers/69eefe1efede9185760d4c65 — DOI: https://doi.org/10.5281/zenodo.19771722