mHC: Manifold-Constrained Hyper-Connections
By: Zhenda Xie , Yixuan Wei , Huanqi Cao and more
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.
Similar Papers
Frac-Connections: Fractional Extension of Hyper-Connections
Machine Learning (CS)
Makes computer learning faster and use less memory.
Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification
Machine Learning (CS)
Helps computers understand complex categories better.
Hierarchical Residuals Exploit Brain-Inspired Compositionality
Machine Learning (CS)
Brain-inspired computer learns faster, sees better.