Score: 2

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Published: September 1, 2025 | arXiv ID: 2509.01535v1

By: Kairong Han , Wenshuo Zhao , Ziyu Zhao and more

BigTech Affiliations: Huawei

Potential Business Impact:

Teaches computers to understand cause and effect.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) have achieved remarkable success across various domains. However, a fundamental question remains: Can LLMs effectively utilize causal knowledge for prediction and generation? Through empirical studies, we find that LLMs trained directly on large-scale data often capture spurious correlations rather than true causal relationships, leading to suboptimal performance, especially in out-of-distribution (OOD) scenarios. To address this challenge, we propose Causal Attention Tuning (CAT), a novel approach that injects fine-grained causal knowledge into the attention mechanism. We propose an automated pipeline that leverages human priors to automatically generate token-level causal signals and introduce the Re-Attention mechanism to guide training, helping the model focus on causal structures while mitigating noise and biases in attention scores. Experimental results on our proposed Spurious Token Game (STG) benchmark and multiple downstream tasks demonstrate that our approach effectively leverages causal knowledge for prediction and remains robust in OOD scenarios. Implementation details can be found at https://github.com/Kairong-Han/CAT.

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Computation and Language

Teaches computers to understand cause and effect.

1 Sep 2025 2

88%

Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

Computation and Language

Helps AI focus on important information, not distractions.

9 Jun 2025 0

87%

Attention and Compression is all you need for Controllably Efficient Language Models

Machine Learning (CS)

Lets computers remember more with less effort.

7 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com github.com

Page Count

18 pages

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Teaches computers to understand cause and effect.

Technical Abstract

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

Attention and Compression is all you need for Controllably Efficient Language Models