Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models
By: Xiaojie Gu , Guangxu Chen , Yuheng Yang and more
Potential Business Impact:
Fixes AI mistakes without breaking other skills.
Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate these issues. Existing model editing methods often focus on optimizing an information matrix that blends new and old knowledge. While effective, these approaches can be computationally expensive and may cause conflicts. In contrast, we shift our attention to Hierarchical Orthogonal Residual SprEad of the information matrix, which reduces noisy gradients and enables more stable edits from a different perspective. We demonstrate the effectiveness of our method HORSE through a clear theoretical comparison with several popular methods and extensive experiments conducted on two datasets across multiple LLMs. The results show that HORSE maintains precise massive editing across diverse scenarios. The code is available at https://github.com/XiaojieGu/HORSE
Similar Papers
Multi-objective Large Language Model Alignment with Hierarchical Experts
Computation and Language
Makes AI understand many different wishes at once.
Horseshoe Mixtures-of-Experts (HS-MoE)
Machine Learning (Stat)
Teaches computers to pick the best answers faster.
Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing
Computation and Language
Fixes AI mistakes without retraining the whole thing.