What question did this study set out to answer?

The aim is to provide a more efficient method for normalizing read depth in genome sequencing data using a novel algorithm.

April 12, 2026Open Access

Efficient downsampling of genome alignments with Rasusa

Key Points

The aim is to provide a more efficient method for normalizing read depth in genome sequencing data using a novel algorithm.
Developed a coordinate-sorted sweep-line algorithm within the software Rasusa.
Utilized seeded random priority assignment for read selection.
Designed the algorithm to enforce a strict coverage cap at every genomic position.
Achieved runtimes over 1,400 times faster than traditional fetch-based methods.
Reduced processing time from hours to seconds.
Requires only 8 MB of memory for long-read data.

Abstract

High-throughput sequencing datasets frequently exhibit extreme read depth variation, biasing downstream analysis. Normalising coverage to a specific depth cap is important, yet existing tools rely on computationally expensive fetch-based or non-deterministic greedy algorithms. Here, we present a new coordinate-sorted sweep-line algorithm implemented in the open-source software rasusa that enforces a strict coverage cap at every genomic position. By utilising seeded random priority assignment, we achieve unbiased, reproducible read selection. The algorithm reduces runtimes by over 1,400-fold compared to legacy fetch-based methods—slashing processing from hours to mere seconds—and operates roughly four times faster than VariantBam. Furthermore, it requires only 8 MB of memory for long-read data. This provides a highly efficient, scalable, and reproducible solution for sequencing coverage normalisation.

Efficient downsampling of genome alignments with Rasusa

Key Points

Abstract

Cite This Study