Near-Optimal Property Testers for Pattern Matching
By: Ce Jin, Tomasz Kociumaka
Potential Business Impact:
Finds patterns in text faster, even with errors.
The classic exact pattern matching problem, given two strings -- a pattern $P$ of length $m$ and a text $T$ of length $n$ -- asks whether $P$ occurs as a substring of $T$. A property tester for the problem needs to distinguish (with high probability) the following two cases for some threshold $k$: the YES case, where $P$ occurs as a substring of $T$, and the NO case, where $P$ has Hamming distance greater than $k$ from every substring of $T$, that is, $P$ has no $k$-mismatch occurrence in $T$. In this work, we provide adaptive and non-adaptive property testers for the exact pattern matching problem, jointly covering the whole spectrum of parameters. We further establish unconditional lower bounds demonstrating that the time and query complexities of our algorithms are optimal, up to $\mathrm{polylog}\, n$ factors hidden within the $\tilde O(\cdot)$ notation below. In the most studied regime of $n=m+\Theta(m)$, our non-adaptive property tester has the time complexity of $\tilde O(n/\sqrt{k})$, and a matching lower bound remains valid for the query complexity of adaptive algorithms. This improves both upon a folklore solution that attains the optimal query complexity but requires $\Omega(n)$ time, and upon the only previously known sublinear-time property tester, by Chan, Golan, Kociumaka, Kopelowitz, and Porat [STOC 2020], with time complexity $\tilde O(n/\sqrt[3]{k})$. The aforementioned results remain valid for $n=m+\Omega(m)$, where our optimal running time $\tilde O(\sqrt{nm/k}+n/k)$ improves upon the previously best time complexity of $\tilde O(\sqrt[3]{n^2m/k}+n/k)$. In the regime of $n=m+o(m)$, which has not been targeted in any previous work, we establish a surprising separation between adaptive and non-adaptive algorithms, whose optimal time and query complexities are $\tilde O(\sqrt{(n-m+1)m/k}+n/k)$ and $\tilde O(\min(n\sqrt{n-m+1}/k,\sqrt{nm/k}+n/k))$, respectively.
Similar Papers
Simple Quantum Algorithm for Approximate $k$-Mismatch Problem
Quantum Physics
Finds similar words in long texts faster.
Computational Complexity in Property Testing
Computational Complexity
Makes computers check data faster than before.
Quantum Pattern Matching with Wildcards
Data Structures and Algorithms
Finds hidden words faster, even with missing letters.