This paper presents an adaptive statistical framework for real-time data quality detection in streaming pipelines, replacing hardcoded rule-based validation with distribution-aware monitoring. The framework combines Kolmogorov-Smirnov testing, Population Stability Index scoring, and Wasserstein distance measurement with an empirically validated recalibration heuristic. Validated on 2,136,000 synthetic records across 90 days and 1,424 micro-batches. Results demonstrate 98.4% false positive reduction versus rule-based baseline with 0 incorrect recalibrations over the full simulation period.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thanigaivendhan Thiyagarajan
Building similarity graph...
Analyzing shared references across papers
Loading...
Thanigaivendhan Thiyagarajan (Thu,) studied this question.
synapsesocial.com/papers/69bf898bf665edcd009e94d5 — DOI: https://doi.org/10.5281/zenodo.19129208