Abstract Motivation Efficient data compression is crucial for reducing storage and transmission costs associated to vast volumes of nanopore raw sequencing data. Surpassing the state-of-the-art compression performance has been challenging, and all recent progress in this direction either incur a computational performance over-cost or resort to lossy compression schemes, which are not always desirable. Results In this paper we present PDZ, a lossless compression algorithm that outperforms VBZ, the current defacto standard, both in compression performance and computational efficiency. In our experimental evaluation, the compression ratio improvement ranges from 0.87% to 2.84% depending on the dataset, the compression speed is 1.09 to 2.25 times faster depending on the hardware, and the decompression speed is 1.01 to 1.52 times faster depending on the hardware. Compared to EX-ZD, a compression algorithm with similar compression performance, the speedup factor for both compression and decompression goes from approximately 1.39× to 1.83×, depending on the hardware. Availability and Implementation PDZ is implemented in C ++ as a new compression method within the POD5 format. The source code is available as a fork of the open-source NanoporeTech library at https://github.com/Rafael-Cast/Piecewise-Differential-Zstd-Coder-POD5-Demo. Supplementary information is available at the journal’s web site.
Castelli et al. (Sat,) studied this question.