Modern ETL (Extract, Transform, Load) workflows must adapt to growing data volumes, diverse sources, and tight SLAs. Embedding machine learning (ML) into data pipelines enables dynamic optimization of data transformations, anomaly detection, and resource allocation, leading to improved throughput, reliability, and cost efficiency. This article surveys key AI techniques for ETL optimization, presents a reference architecture for AI-powered data pipelines, discusses implementation considerations, and highlights future research directions.
Ujjawal Nayak (Sun,) studied this question.