Key points are not available for this paper at this time.
In this work, we address the problem of approximate pattern matching with wildcards. Given a pattern P of length m containing D wildcards, a text T of length n, and an integer k, our objective is to identify all fragments of T within Hamming distance k from P. Our primary contribution is an algorithm with runtime O (n+ (D+k) (G+k) n/m) for this problem. Here, G D represents the number of maximal wildcard fragments in P. We derive this algorithm by elaborating in a non-trivial way on the ideas presented by Charalampopoulos et al. , FOCS'20 for pattern matching with mismatches (without wildcards). Our algorithm improves over the state of the art when D, G, and k are small relative to n. For instance, if m = n/2, k=G=n^2/5, and D=n^3/5, our algorithm operates in O (n) time, surpassing the (n^6/5) time requirement of all previously known algorithms. In the case of exact pattern matching with wildcards (k=0), we present a much simpler algorithm with runtime O (n+DG n/m) that clearly illustrates our main technical innovation: the utilisation of positions of P that do not belong to any fragment of P with a density of wildcards much larger than D/m as anchors for the sought (approximate) occurrences. Notably, our algorithm outperforms the best-known O (n m) -time FFT-based algorithms of Cole and Hariharan, STOC'02 and Clifford and Clifford, IPL'04 if DG = o (m m). We complement our algorithmic results with a structural characterization of the k-mismatch occurrences of P. We demonstrate that in a text of length O (m), these occurrences can be partitioned into O ( (D+k) (G+k) ) arithmetic progressions. Additionally, we construct an infinite family of examples with ( (D+k) k) arithmetic progressions of occurrences, leveraging a combinatorial result on progression-free sets Elkin, SODA'10.
Bathie et al. (Mon,) studied this question.