Learning words in groups: fusion algebras, tensor ranks and grokking
By: Maor Shutman, Oren Louidor, Ran Tessler
Potential Business Impact:
Computers learn math rules by finding patterns.
In this work, we demonstrate that a simple two-layer neural network with standard activation functions can learn an arbitrary word operation in any finite group, provided sufficient width is available and exhibits grokking while doing so. To explain the mechanism by which this is achieved, we reframe the problem as that of learning a particular $3$-tensor, which we show is typically of low rank. A key insight is that low-rank implementations of this tensor can be obtained by decomposing it along triplets of basic self-conjugate representations of the group and leveraging the fusion structure to rule out many components. Focusing on a phenomenologically similar but more tractable surrogate model, we show that the network is able to find such low-rank implementations (or approximations thereof), thereby using limited width to approximate the word-tensor in a generalizable way. In the case of the simple multiplication word, we further elucidate the form of these low-rank implementations, showing that the network effectively implements efficient matrix multiplication in the sense of Strassen. Our work also sheds light on the mechanism by which a network reaches such a solution under gradient descent.
Similar Papers
A Theoretical Framework for Discovering Groups and Unitary Representations via Tensor Factorization
Machine Learning (CS)
Finds hidden patterns in data using math.
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
Machine Learning (Stat)
Teaches computers to learn hidden patterns faster.
Rates and architectures for learning geometrically non-trivial operators
Machine Learning (CS)
Learns math problems from few examples.