Score: 0

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

Published: August 30, 2025 | arXiv ID: 2509.00388v1

By: Xuelin Li, Xiangqi Jin, Linfeng Zhang

Potential Business Impact:

Helps AI remember more of long stories.

Business Areas:

Text Analytics Data and Analytics, Software

Efficient Key-Value (KV) cache management is essential for processing long text sequences in large language models (LLMs), where memory constraints often limit performance. Conventional KV eviction strategies, such as top-k selection based on attention scores, depend on static heuristics that fail to capture the evolving implicit dependencies among tokens during inference. To overcome this, we propose GraphKV, a graph-based framework that redefines token selection for KV cache compression. In GraphKV, tokens are modeled as nodes with importance scores, and edges represent their similarity relationships. Through a decay-signal-propagation mechanism, token importance is dynamically updated by propagating information across the graph, enabling adaptive retention of the most contextually significant tokens. GraphKV can be seamlessly utilized in existing KV cache eviction methods such as SnapKV and PyramidKV in a plug-and-play manner. Codes will be released on Github.

G-KV: Decoding-Time KV Cache Eviction with Global Attention

Computation and Language

Makes AI remember more without slowing down.

29 Nov 2025 3

90%

CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation

Computation and Language

Makes AI remember more without slowing down.

4 Aug 2025 3

89%

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

Machine Learning (CS)

Makes AI remember more without getting slow.

7 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

Helps AI remember more of long stories.

Technical Abstract

G-KV: Decoding-Time KV Cache Eviction with Global Attention

CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important