Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding
By: Zikai Xiao , Ziyang Wang , Wen Ma and more
Potential Business Impact:
Makes AI remember more of long stories.
While Large Language Models (LLMs) support long contexts, they struggle with performance degradation within the context window. Current solutions incur prohibitive training costs, leaving statistical behaviors and cost-effective approaches underexplored. From the decoding perspective, we identify the Posterior Salience Attenuation (PSA) phenomenon, where the salience ratio correlates with long-text performance degradation. Notably, despite the attenuation, gold tokens still occupy high-ranking positions in the decoding space. Motivated by it, we propose the training-free Positional Contrastive Decoding (PCD) that contrasts the logits derived from long-aware attention with those from designed local-aware attention, enabling the model to focus on the gains introduced by large-scale short-to-long training. Through the analysis of long-term decay simulation, we demonstrate that PCD effectively alleviates attention score degradation. Experimental results show that PCD achieves state-of-the-art performance on long-context benchmarks.
Similar Papers
Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage
Computation and Language
Saves computer time by stopping early.
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Computation and Language
Stops AI from making up answers.
Lag-Relative Sparse Attention In Long Context Training
Computation and Language
Helps computers remember more of long stories.