August 25, 2025

Engineering rank/select data structures for large-alphabet strings

Puntos clave

Improves select operation speed by 80% while using 11% additional space, demonstrating high efficiency in processing.
Utilizes alphabet-partitioned compressed data structures to enhance storage and access times for large-alphabet strings.
Designed for various applications, including inverted list intersections and distributed computation of rank and select.
Achieves a space-saving ratio of 0.98–1.09 compared to state-of-the-art RLFM-indexes with better performance guarantees.

Resumen

Abstract Large-alphabet strings, prevalent in information retrieval and natural language processing, pose unique storage and processing challenges. This paper explores the efficient implementation of the alphabet-partition approach, introducing a compressed data structure that efficiently supports the operations rank and select. Our implementation significantly outperforms existing methods, improving the select operation speed by 80% with only 11% additional space. We demonstrate the utility of our structure in various applications, including inverted list intersections, run-length compressed strings, and distributed computation of rank and select. Notably, for run-length compressed strings using the Burrows–Wheeler transform, our data structure requires only 0. 98–1. 09 times the space of state-of-the-art RLFM-indexes to achieve 1. 23–2. 33 times faster pattern occurrence counting while also providing better theoretical guarantees.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Arroyuelo et al. (Mon,) studied this question.

synapsesocial.com/papers/68af5d63ad7bf08b1eae09fd https://doi.org/https://doi.org/10.1093/comjnl/bxaf102

Preguntar a la IA

Me gusta

Guardar