Score: 1

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Published: November 9, 2025 | arXiv ID: 2511.06446v1

By: Bohan Yu, Wei Huang, Kang Liu

BigTech Affiliations: Baidu

Potential Business Impact:

Lets computers learn from huge facts instantly.

Business Areas:

Semantic Search Internet Services

This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the models latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments demonstrate that SR-KI enables the integration of up to 40K KBs into a 7B LLM on a single A100 40GB GPU, and achieves strong retrieval performance, maintaining over 98% Recall@10 on the best-performing task and exceeding 88% on average across all tasks. Task performance on question answering and KB ID generation also demonstrates that SR-KI maintains strong performance while achieving up to 99.75% compression of the injected KBs.

KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering

Computation and Language

Helps computers answer questions by checking facts.

10 Dec 2025 0

88%

DySK-Attn: A Framework for Efficient, Real-Time Knowledge Updating in Large Language Models via Dynamic Sparse Knowledge Attention

Computation and Language

Lets computers learn new facts instantly.

10 Aug 2025 1

88%

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

Computation and Language

Makes AI remember more facts without slowing down.

20 Oct 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

16 pages

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Lets computers learn from huge facts instantly.

Technical Abstract

KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering

DySK-Attn: A Framework for Efficient, Real-Time Knowledge Updating in Large Language Models via Dynamic Sparse Knowledge Attention

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM