Score: 1

From regular expressions to deterministic finite automata: $2^{\frac{n}{2}+\sqrt{n}(\log n)^{Θ(1)}}$ states are necessary and sufficient

Published: April 29, 2025 | arXiv ID: 2504.20555v1

By: Olga Martynova, Alexander Okhotin

Potential Business Impact:

Makes computer programs run much faster.

Business Areas:
A/B Testing Data and Analytics

It is proved that every regular expression of alphabetic width $n$, that is, with $n$ occurrences of symbols of the alphabet, can be transformed into a deterministic finite automaton (DFA) with $2^{\frac{n}{2}+(\frac{\log_2 e}{2\sqrt{2}}+o(1))\sqrt{n\ln n}}$ states recognizing the same language (the best upper bound up to date is $2^n$). At the same time, it is also shown that this bound is close to optimal, namely, that there exist regular expressions of alphabetic width $n$ over a two-symbol alphabet, such that every DFA for the same language has at least $2^{\frac{n}{2}+(\sqrt{2} + o(1))\sqrt{\frac{n}{\ln n}}}$ states (the previously known lower bound is $\frac{5}{4}2^{\frac{n}{2}}$). The same bounds are obtained for an intermediate problem of determinizing nondetermistic finite automata (NFA) with each state having all incoming transitions by the same symbol.

Page Count
20 pages

Category
Computer Science:
Formal Languages and Automata Theory