The exponential growth of transactional and sales data stored in Azure Data Lake Storage (ADLS) has introduced significant challenges in maintaining query performance and storage efficiency for Delta tables in cloud-based Lakehouse architectures. Traditional optimization strategies, such as manually scheduled OPTIMIZE and VACUUM operations, are reactive in nature and often fail to adapt to dynamic workload patterns, resulting in file fragmentation, increased query latency, and elevated storage costs. This paper presents an architecture for Predictive Optimization of Delta tables on ADLS within the Databricks Lakehouse platform, leveraging Unity Catalog-managed tables to automate and intelligently schedule compaction and vacuuming operations. Experimental evaluation conducted on real-world transactional and sales datasets demonstrates approximately 30% improvement in query execution time, small file count reduction, and storage efficiency compared to manual optimization baselines.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sagar Gowda
Building similarity graph...
Analyzing shared references across papers
Loading...
Sagar Gowda (Mon,) studied this question.
www.synapsesocial.com/papers/69aa710d531e4c4a9ff5b5d9 — DOI: https://doi.org/10.5281/zenodo.18839861