Suppose that we are given a string s of length n over an alphabet 0, 1, …, nO (1) and δ is the string complexity of s, a known compression measure. We describe an index on s with O (δlog (n/δ) ) space, measured in O (log n) -bit machine words, which can search in s any string of length m in O (m + (occ + 1) log^ε n) time, where occ is the number of occurrences and ε > 0 is any fixed constant (the big-O in the space bound hides factor 1/ε). Crucially, the index can be built in O (n log n) expected time by one left-to-right pass on the string s in a streaming fashion with O (δlog (n/δ) ) construction space. The index does not use the Karp-Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is O (δlog n/ (δα) ), where α = log_σ n and σ is the alphabet size, and it coincides with O (δlog (n/δ) ) when δ = O (n/α²) ). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for δ ≥ Ω (log log n).
Dmitry Kosolobov (Thu,) studied this question.