Score: 0

Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention

Published: December 23, 2025 | arXiv ID: 2512.20562v1

By: Yingzhen Yang

We study the problem of learning a low-degree spherical polynomial of degree $\ell_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network (NN) with channel attention in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0,1)$, a carefully designed two-layer NN with channel attention and finite width of $m \ge Θ({n^4 \log (2n/δ)}/{d^{2\ell_0}})$ trained by the vanilla gradient descent (GD) requires the lowest sample complexity of $n \asymp Θ(d^{\ell_0}/\eps)$ with probability $1-δ$ for every $δ\in (0,1)$, in contrast with the representative sample complexity $Θ\pth{d^{\ell_0} \max\set{\eps^{-2},\log d}}$, where $n$ is the training daata size. Moreover, such sample complexity is not improvable since the trained network renders a sharp rate of the nonparametric regression risk of the order $Θ(d^{\ell_0}/{n})$ with probability at least $1-δ$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $Θ(d^{\ell_0})$ is $Θ(d^{\ell_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GD is minimax optimal. The training of the two-layer NN with channel attention consists of two stages. In Stage 1, a provable learnable channel selection algorithm identifies the ground-truth channel number $\ell_0$ from the initial $L \ge \ell_0$ channels in the first-layer activation, with high probability. This learnable selection is achieved by an efficient one-step GD update on both layers, enabling feature learning for low-degree polynomial targets. In Stage 2, the second layer is trained by standard GD using the activation function with the selected channels.

Geometry and Optimization of Shallow Polynomial Networks

Machine Learning (CS)

Helps computers learn better by understanding math.

10 Jan 2025 0

86%

Accelerating Machine Learning Systems via Category Theory: Applications to Spherical Attention for Gene Regulatory Networks

Category Theory

AI learns to build itself faster and better.

14 May 2025 3

86%

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

Machine Learning (Stat)

Teaches computers to learn hidden patterns faster.

19 Nov 2025 0

View PDF Login to Bookmark

Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention

Technical Abstract

Geometry and Optimization of Shallow Polynomial Networks

Accelerating Machine Learning Systems via Category Theory: Applications to Spherical Attention for Gene Regulatory Networks

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit