Modern data engineering faces unprecedented challenges in managing increasingly complex, distributed data pipelines that process massive volumes of data across heterogeneous environments. Traditional approaches to pipeline monitoring and maintenance rely heavily on manual intervention, reactive troubleshooting, and rule-based automation that struggles to adapt to dynamic operational conditions. This paper introduces agentic data engineering—a paradigm shift from procedural execution to intent-based orchestration through autonomous intelligent agents. Agentic data engineering leverages artificial intelligence to create self-managing data infrastructure capable of continuous monitoring, proactive anomaly detection, autonomous fault recovery, and adaptive optimization. Drawing on a comprehensive review of 30 recent research papers and industry implementations, this paper establishes theoretical foundations, presents detailed technical architectures, and examines practical applications across financial services, healthcare, telecommunications, and cloud infrastructure domains. Key findings demonstrate that autonomous agent systems can reduce mean time to recovery by up to 70%, improve data quality through proactive maintenance, and significantly decrease the cognitive burden on data engineering teams. This work provides a technology-agnostic framework for implementing agentic data engineering systems and identifies critical research directions for advancing autonomous data infrastructure.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thuy Thi Thu Tran; Quynh Nguyen
Building similarity graph...
Analyzing shared references across papers
Loading...
Thuy Thi Thu Tran; Quynh Nguyen (Mon,) studied this question.
www.synapsesocial.com/papers/69ba44654e9516ffd37a6112 — DOI: https://doi.org/10.5281/zenodo.19054606