What question did this study set out to answer?

The aim is to develop a faster method for hashing spaced seeds to improve sequence analysis efficiency.

March 30, 2026

Fast Hashing of Spaced Seeds with DuoHash

Key Points

The aim is to develop a faster method for hashing spaced seeds to improve sequence analysis efficiency.
Introduced DuoHash, a framework for efficient spaced seed hashing
Utilized binary encoding and precomputed tables for hash computations
Tested on both forward and reverse DNA strands
DuoHash achieved speedups of up to 11x compared to existing algorithms
Demonstrated applicability for spaced k-mers counting tasks

Abstract

Many state-of-the-art tools for sequence analysis are based on alignment-free techniques to manage high-throughput processing. Several routine tasks such as querying, indexing, and similarity search are based on k-mer statistics. In order to accommodate errors or mutations, spaced seeds have been increasingly used instead of k -mers, enhancing sensitivity in various applications. However, spaced seed hashing is computationally intensive, introducing significant slowdown in the processing. This article addresses the challenge of efficient spaced seed hashing, which is functional for the computation of spaced k-mers counting. We present DuoHash, a framework that enables the efficient computation of hash functions for spaced seeds. DuoHash exploits an efficient spaced seed binary encoding and precomputed tables to speedup the computation of the hash value for both the forward and reverse strands of a DNA sequence. In our experiments, DuoHash substantially outperforms existing algorithms, achieving speedups of up to 11x on short reads with a spaced seed of medium density. Furthermore, we show the applicability of DuoHash to the problem of spaced k-mers counting. The code of DuoHash is available at https://github.com/CominLab/DuoHash/ .

Bookmark

Cite This Study

Gemin et al. (Sat,) studied this question.

synapsesocial.com/papers/69ca1369883daed6ee09563b https://doi.org/https://doi.org/10.1177/15578666261423555

Bookmark