Score: 1

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs

Published: March 31, 2025 | arXiv ID: 2504.01994v1

By: Jinendra Malekar , Peyton Chandarana , Md Hasibul Amin and more

Potential Business Impact:

Makes AI chat faster and use less power.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In this paper, we propose PIM-LLM, a hybrid architecture developed to accelerate 1-bit large language models (LLMs). PIM-LLM leverages analog processing-in-memory (PIM) architectures and digital systolic arrays to accelerate low-precision matrix multiplication (MatMul) operations in projection layers and high-precision MatMul operations in attention heads of 1-bit LLMs, respectively. Our design achieves up to roughly 80x improvement in tokens per second and a 70% increase in tokens per joule compared to conventional hardware accelerators. Additionally, PIM-LLM outperforms previous PIM-based LLM accelerators, setting a new benchmark with at least 2x and 5x improvement in GOPS and GOPS/W, respectively.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Page Count
8 pages

Category
Computer Science:
Hardware Architecture