What question did this study set out to answer?

The research aims to review and assess machine learning-based caching and tiering strategies in modern data storage systems.

March 26, 2026Open Access

Machine Learning-Based Caching and Tiering in Modern Data Storage Systems: A Survey

Key Points

The research aims to review and assess machine learning-based caching and tiering strategies in modern data storage systems.
Conducted a comprehensive survey of existing literature on ML caching and tiering policies.
Examined theoretical foundations and practical implementations of these ML techniques.
Analysed features, baselines for comparison, and evaluation metrics
Identified emerging trends and future directions in data storage systems.
ML-based policies increase cache hit rates and reduce latency.
These techniques provide cost-effective solutions for data placement across storage media.
The study outlines most common features of caching and tiering policies.

Abstract

Modern data storage systems contain numerous media devices (e.g., DRAM, NVRAM, SSDs, and HDDs) organized into multi-level caches and storage tiers to optimize the performance-to-cost ratio. Traditional caching policies for eviction, admission, and prefetching, as well as tiering policies for data placement and migration, offer simplicity but struggle to adapt to the dynamic and complex access patterns observed in modern data-intensive applications. As a result, machine learning (ML)-based policies have recently gained popularity due to their ability to make predictions and proactively optimize caching and tiering operations. These ML techniques not only increase cache hit rates and reduce latency but also provide scalable, cost-effective solutions that intelligently place data across different storage media. This manuscript reviews the state-of-the-art ML-based caching and tiering approaches, examining their theoretical foundations and practical implementations. It also presents the most common features for each policy type, the most popular baselines for comparison, and the typical evaluation metrics. Finally, it discusses emerging trends and outlines potential directions for future research in data storage systems.

Machine Learning-Based Caching and Tiering in Modern Data Storage Systems: A Survey

Key Points

Abstract

Cite This Study