Score: 0

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Published: November 17, 2025 | arXiv ID: 2511.13061v1

By: Vladimír Macko, Vladimír Boža

Potential Business Impact:

Makes AI models use less memory and run faster.

Business Areas:

MOOC Education, Software

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in the inference of sparse Large Language Models (LLMs). Because existing SpMV methods perform poorly under the low and unstructured sparsity (30-90%) commonly observed in pruned LLMs, unstructured pruning provided only limited memory reduction and speedup. We propose MACKO-SpMV, a GPU-optimized format and kernel co-designed to reduce storage overhead while preserving compatibility with the GPU's execution model. This enables efficient SpMV for unstructured sparsity without specialized hardware units (e.g., tensor cores) or format-specific precomputation. Empirical results show that at sparsity 50%, MACKO is the first approach with significant 1.5x memory reduction and 1.2-1.5x speedup over dense representation. Speedups over other SpMV baselines: 2.8-13.0x over cuSPARSE, 1.9-2.6x over Sputnik, and 2.2-2.5x over DASP. Applied to Llama2-7B pruned with Wanda to sparsity 50%, it delivers 1.5x memory reduction and 1.5x faster inference at fp16 precision. Thanks to MACKO, unstructured pruning at 50% sparsity is now justified in real-world LLM workloads.

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage

Distributed, Parallel, and Cluster Computing

Makes AI models run much faster and smaller.

16 Jul 2025 1

88%

Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part I

Logic in Computer Science

Speeds up computer math for science.

15 Oct 2025 1

88%

LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures

Distributed, Parallel, and Cluster Computing

Makes computer math problems run much faster.

11 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇸🇰 Slovakia

Page Count

15 pages

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Makes AI models use less memory and run faster.

Technical Abstract

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage

Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part I

LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures