Better Neural Network Expressivity: Subdividing the Simplex
By: Egor Bakaev , Florestan Brunck , Christoph Hertrich and more
Potential Business Impact:
Makes computers learn faster with fewer steps.
This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.
Similar Papers
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Machine Learning (CS)
Makes computer brains learn more with fewer steps.
On the Depth of Monotone ReLU Neural Networks and ICNNs
Machine Learning (CS)
Makes computers better at finding the biggest number.
Parameterized Hardness of Zonotope Containment and Neural Network Verification
Computational Complexity
Makes AI harder to check for mistakes.