Traceprop is an open-source Python library providing the first unified system for end-to-end data provenance in machine learning pipelines, connecting raw source files through preprocessing, through model training, to individual predictions. Existing data attribution methods Koh and Liang, 2017, Park et al., 2023, Engstrom et al., 2024 identify which training samples influenced a prediction but operate in isolation from the data pipeline. Existing computation lineage tools (MLflow, DVC, TensorFlow MLMD) track artifact-level provenance but do not descend into the computation graph or connect to gradient-level attribution. Traceprop fills this gap by introducing a computation-level lineage layer that integrates natively with gradient-based attribution. A single Traceprop query answers: “This model made prediction X: which rows in which source files, through which preprocessing steps, most influenced that prediction, and can we reduce that influence without retraining?” We demonstrate: (1) sub-1% lineage overhead in production op-mode at 106+ array elements (1.007×on macOS, 0.979×on Linux); (2) Traceprop- LL achieving LDS 0.622 ±0.180 on tabular data (UCI Adult Income, logistic regression) at 0.22 s on CPU, and Traceprop-LL achieving LDS 0.0168 on CIFAR-2/ResNet-9 vs. TRAK’s 0.0290 at 266×lower wall-clock cost (2.6 s CPU vs. 691 s GPU); (3) provenance-guided approximate unlearning exceeding the retrain-from-scratch gold standard (forget-set loss 0.425 vs. gold 0.401, vs. 14% gap closed for random unlearning) with a test accuracy drop of only 0.5 percentage points (0.915 vs. 0.920). Traceprop directly addresses EU AI Act Article 26 audit trail obligations for high-risk AI systems, whose backstop enforcement date is 2 December 2027. The library is available at https://pypi.org/project/traceprop/
Amit N. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: