Score: 0

Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders

Published: December 11, 2025 | arXiv ID: 2512.10547v1

By: Qingsen Ma , Dianyun Wang , Jiaming Lyu and more

Potential Business Impact:

Makes AI understand thoughts better, saving memory.

Business Areas:

Semantic Search Internet Services

The Key-Value (KV) cache is the primary memory bottleneck in long-context Large Language Models, yet it is typically treated as an opaque numerical tensor. In this work, we propose \textbf{STA-Attention}, a framework that utilizes Top-K Sparse Autoencoders (SAEs) to decompose the KV cache into interpretable ``semantic atoms.'' Unlike standard $L_1$-regularized SAEs, our Top-K approach eliminates shrinkage bias, preserving the precise dot-product geometry required for attention. Our analysis uncovers a fundamental \textbf{Key-Value Asymmetry}: while Key vectors serve as highly sparse routers dominated by a ``Semantic Elbow,'' deep Value vectors carry dense content payloads requiring a larger budget. Based on this structure, we introduce a Dual-Budget Strategy that selectively preserves the most informative semantic components while filtering representational noise. Experiments on Yi-6B, Mistral-7B, Qwen2.5-32B, and others show that our semantic reconstructions maintain perplexity and zero-shot performance comparable to the original models, effectively bridging the gap between mechanistic interpretability and faithful attention modeling.

KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models

Machine Learning (CS)

Makes AI remember more, use less computer memory.

7 Dec 2025 1

89%

PureKV: Plug-and-Play KV Cache Optimization with Spatial-Temporal Sparse Attention for Vision-Language Large Models

Multimedia

Makes AI understand videos much faster.

29 Oct 2025 1

89%

PureKV: Plug-and-Play KV Cache Optimization with Spatial-Temporal Sparse Attention for Vision-Language Large Models

Multimedia

Makes AI understand videos much faster.

29 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

10 pages

Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders

Makes AI understand thoughts better, saving memory.

Technical Abstract

KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models

PureKV: Plug-and-Play KV Cache Optimization with Spatial-Temporal Sparse Attention for Vision-Language Large Models

PureKV: Plug-and-Play KV Cache Optimization with Spatial-Temporal Sparse Attention for Vision-Language Large Models