Key points are not available for this paper at this time.
We consider the problem of constructing codes capable of correcting long tandem duplication errors of variable length. We present a subquadratic-complexity algorithm that uses only one symbol of redundancy to encode q-ary length-n words into codewords, which can correct a single duplication of length at least K=4∙logqn+1. We enhance the error-correcting capability by introducing codes without efficient encoding, leading to an improved value of K=logqn+ϕ(n), where ϕ(n) is an arbitrary function such that ϕ(n)∞ as n∞. In the class of codes correcting a single long duplication with redundancy 1, the value K in our constructions is order-optimal. Finally, k-repeat-free codes, in which every codeword contains any k-tuple at most once, are shown to correct any number of independent long duplications, each of length at least K=2k, occurring simultaneously without any mutual interference.
Goshkoder et al. (Tue,) studied this question.