Linear Time Subsequence and Supersequence Regex Matching
By: Antoine Amarilli , Florin Manea , Tina Ringleb and more
Potential Business Impact:
Finds patterns in text much faster.
It is well-known that checking whether a given string $w$ matches a given regular expression $r$ can be done in quadratic time $O(|w|\cdot |r|)$ and that this cannot be improved to a truly subquadratic running time of $O((|w|\cdot |r|)^{1-\epsilon})$ assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether $w$ has a subsequence that matches $r$, and show that regex matching in this sense can be solved in linear time $O(|w| + |r|)$. Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of $w$ that matches $r$ can be solved in $O(|w| \cdot |r|)$, i. e., asymptotically no worse than classical regex matching; and we show that $O(|w| + |r|)$ is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.
Similar Papers
Quantum Pattern Matching with Wildcards
Data Structures and Algorithms
Finds hidden words faster, even with missing letters.
$k$-Universality of Regular Languages Revisited
Formal Languages and Automata Theory
Finds shortest text containing all short word combinations.
Efficient Matching of Some Fundamental Regular Expressions with Backreferences
Formal Languages and Automata Theory
Makes computer text searches faster with tricky patterns.