CRoPE: Efficient Parametrization of Rotary Positional Embedding
By: Beicheng Lou, Zifei Xu
Potential Business Impact:
Makes computer brains use fewer parts.
Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of $Q/K/V$-projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a more natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance both in sample and out of sample. Our modification achieves more efficient parameter usage, as well as a cleaner interpretation of the representation space.
Similar Papers
Selective Rotary Position Embedding
Computation and Language
Makes AI better at remembering and understanding long stories.
Context-aware Rotary Position Embedding
Computation and Language
Makes AI understand word order better.
VRoPE: Rotary Position Embedding for Video Large Language Models
Artificial Intelligence
Helps AI understand videos better.