March 19, 2024

Using Machine Learning to Predict Effective Compression Algorithms for Heterogeneous Datasets

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Heterogeneous datasets are prevalent in big-data domains. However, compressing such datasets with a single algorithm results in suboptimal compression ratios. This paper investigates how machine-learning techniques can help by predicting an effective compression algorithm for each file in a heterogeneous dataset. In particular, we show how to train a very simple model using nothing but the compression ratios of a few algorithms as features. We named this technique "MLcomp". Despite its simplicity, it is very effective as our evaluation on nearly 9,000 files from a heterogeneous dataset and a library of over 100,000 compression algorithms demonstrates. Using MLcomp to pick one lossless algorithm from this library for each file yields an average compression ratio that is 97.8% of the best possible.

Me gusta

Guardar

Cite This Study

Burtchell et al. (Tue,) studied this question.

synapsesocial.com/papers/68e73757b6db6435876b08c1 https://doi.org/https://doi.org/10.1109/dcc58796.2024.00026

Me gusta

Guardar