Universality of physical neural networks with multivariate nonlinearity
By: Benjamin Savinson , David J. Norris , Siddhartha Mishra and more
Potential Business Impact:
Makes AI learn faster using light.
The enormous energy demand of artificial intelligence is driving the development of alternative hardware for deep learning. Physical neural networks try to exploit physical systems to perform machine learning more efficiently. In particular, optical systems can calculate with light using negligible energy. While their computational capabilities were long limited by the linearity of optical materials, nonlinear computations have recently been demonstrated through modified input encoding. Despite this breakthrough, our inability to determine if physical neural networks can learn arbitrary relationships between data -- a key requirement for deep learning known as universality -- hinders further progress. Here we present a fundamental theorem that establishes a universality condition for physical neural networks. It provides a powerful mathematical criterion that imposes device constraints, detailing how inputs should be encoded in the tunable parameters of the physical system. Based on this result, we propose a scalable architecture using free-space optics that is provably universal and achieves high accuracy on image classification tasks. Further, by combining the theorem with temporal multiplexing, we present a route to potentially huge effective system sizes in highly practical but poorly scalable on-chip photonic devices. Our theorem and scaling methods apply beyond optical systems and inform the design of a wide class of universal, energy-efficient physical neural networks, justifying further efforts in their development.
Similar Papers
Nonlinear Computation with Linear Optics via Source-Position Encoding
Optics
Makes computers learn faster using light.
Physics-Constrained Adaptive Neural Networks Enable Real-Time Semiconductor Manufacturing Optimization with Minimal Training Data
Machine Learning (CS)
Makes computer chips faster and cheaper to design.
Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers
Machine Learning (CS)
Makes AI learn faster using light and math.