Los puntos clave no están disponibles para este artículo en este momento.
Data deduplication, implemented usually with content defined chunking (CDC), is today one of key features of advanced storage systems providing space for backup applications. Although simple and effective, CDC generates chunks with sizes clustered around expected chunk size, which is globally fixed for a given storage system and applies to all backups. This creates opportunity for improvement, as the optimal chunk size for deduplication varies not only among backup datasets, but also within one dataset: long stretches of unchanged data favor larger chunks, whereas regions of change prefer smaller ones.
Romański et al. (Mon,) studied this question.