Score: 1

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs

Published: March 31, 2025 | arXiv ID: 2504.01994v1

By: Jinendra Malekar , Peyton Chandarana , Md Hasibul Amin and more

Potential Business Impact:

Makes AI chat faster and use less power.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In this paper, we propose PIM-LLM, a hybrid architecture developed to accelerate 1-bit large language models (LLMs). PIM-LLM leverages analog processing-in-memory (PIM) architectures and digital systolic arrays to accelerate low-precision matrix multiplication (MatMul) operations in projection layers and high-precision MatMul operations in attention heads of 1-bit LLMs, respectively. Our design achieves up to roughly 80x improvement in tokens per second and a 70% increase in tokens per joule compared to conventional hardware accelerators. Additionally, PIM-LLM outperforms previous PIM-based LLM accelerators, setting a new benchmark with at least 2x and 5x improvement in GOPS and GOPS/W, respectively.

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

Hardware Architecture

Makes AI understand things much faster.

10 Nov 2025 4

92%

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

Hardware Architecture

Makes AI think faster using less computer power.

10 Nov 2025 4

91%

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

Hardware Architecture

Makes AI models run much faster and use less power.

10 Nov 2025 4

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

8 pages

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs

Makes AI chat faster and use less power.

Technical Abstract

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats