The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
By: Alexey Kravatskiy , Ivan Kozyrev , Nikolai Kozlov and more
Potential Business Impact:
Makes AI learn better and faster.
In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_\infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.
Similar Papers
Muon Optimizes Under Spectral Norm Constraints
Machine Learning (CS)
Makes AI learn better by understanding its math.
Matrix-Free Two-to-Infinity and One-to-Two Norms Estimation
Machine Learning (CS)
Makes AI smarter and safer from attacks.
Quaternion Nuclear Norms Over Frobenius Norms Minimization for Robust Matrix Completion
CV and Pattern Recognition
Fixes broken or missing data in complex pictures.