Score: 1

SAP: Syntactic Attention Pruning for Transformer-based Language Models

Published: December 22, 2025 | arXiv ID: 2512.19125v1

By: Tzu-Yun Lee, Ding-Yong Hong, Jan-Jan Wu

Potential Business Impact:

Makes AI understand language better and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper introduces Syntactic Attention Pruning (SAP), a novel method for effectively pruning attention heads in Transformer models. Unlike conventional approaches that rely solely on mathematical analysis of model weights and activations, SAP incorporates both the syntactic structure and attention patterns of sentences to guide the pruning process. By leveraging these linguistic features, SAP not only achieves performance comparable to state-of-the-art methods but also enhances the interpretability of model behavior. To further improve robustness, we propose Candidate Filtering (CF), a mechanism that prioritizes heads based on their contribution to model performance, mitigating degradation during pruning. Experimental results indicate that SAP effectively preserves critical heads of a high density of strong attention values, outperforming existing head pruning strategies in retrain-free settings. These findings position SAP as a promising foundation for a new direction in model compression research, offering high flexibility for pruning across all transformer-based language models.

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures

Sound

Makes smart computer programs smaller and faster.

30 Sep 2025 1

87%

Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition

Computation and Language

Listens better to long talks, even with noise.

14 Nov 2025 2

87%

SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models

Machine Learning (CS)

Makes smart predictions faster by removing unneeded parts.

13 May 2025 1

View PDF Login to Bookmark

Page Count

9 pages

SAP: Syntactic Attention Pruning for Transformer-based Language Models

Makes AI understand language better and faster.

Technical Abstract

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures

Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition

SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models