Modern data storage systems contain numerous media devices (e.g., DRAM, NVRAM, SSDs, and HDDs) organized into multi-level caches and storage tiers to optimize the performance-to-cost ratio. Traditional caching policies for eviction, admission, and prefetching, as well as tiering policies for data placement and migration, offer simplicity but struggle to adapt to the dynamic and complex access patterns observed in modern data-intensive applications. As a result, machine learning (ML)-based policies have recently gained popularity due to their ability to make predictions and proactively optimize caching and tiering operations. These ML techniques not only increase cache hit rates and reduce latency but also provide scalable, cost-effective solutions that intelligently place data across different storage media. This manuscript reviews the state-of-the-art ML-based caching and tiering approaches, examining their theoretical foundations and practical implementations. It also presents the most common features for each policy type, the most popular baselines for comparison, and the typical evaluation metrics. Finally, it discusses emerging trends and outlines potential directions for future research in data storage systems.
Savva et al. (Tue,) studied this question.