A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory
By: Itay Safran
Potential Business Impact:
Makes computers learn complex math faster.
We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $Ω\big(d^{1+\frac{1}{2^{k-2}-1}}\big)$ is necessary to represent the maximum for any depth $3\le k\le \log_2(\log_2(d))$. This is the first unconditional super-linear lower bound for this fundamental operator at depths $k\ge3$, and it holds even if the depth scales with $d$. Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Turán's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that stems from the geometric structure of its non-differentiable hyperplanes, and provides a novel approach for proving lower bounds for deep neural networks.
Similar Papers
Depth-Bounds for Neural Networks via the Braid Arrangement
Machine Learning (CS)
Makes computers learn to pick the biggest number.
Depth-Bounds for Neural Networks via the Braid Arrangement
Machine Learning (CS)
Makes computers better at finding the biggest number.
On the Depth of Monotone ReLU Neural Networks and ICNNs
Machine Learning (CS)
Makes computers better at finding the biggest number.