Score: 0

NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

Published: November 17, 2025 | arXiv ID: 2511.13950v1

By: Lei Zhao , Luca Buonanno , Archit Gajjar and more

Potential Business Impact:

Makes AI learn and think much faster, using less power.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations: (1) they support only \textit{static} dot-product operations and cannot accelerate arbitrary non-linear functions or data-dependent multiplications essential to modern LLMs; (2) they demand large, power-hungry analog-to-digital converter (ADC) circuits; and (3) mapping model weights to device conductance introduces errors from cell nonidealities. These challenges hinder scalable and accurate IMC acceleration as models grow. We propose NL-DPE, a Non-Linear Dot Product Engine that overcomes these barriers. NL-DPE augments crosspoint arrays with RRAM-based Analog Content Addressable Memory (ACAM) to execute arbitrary non-linear functions and data-dependent matrix multiplications in the analog domain by transforming them into decision trees, fully eliminating ADCs. To address device noise, NL-DPE uses software-based Noise Aware Fine-tuning (NAF), requiring no in-device calibration. Experiments show that NL-DPE delivers 28X energy efficiency and 249X speedup over a GPU baseline, and 22X energy efficiency and 245X speedup over existing IMC accelerators, while maintaining high accuracy.

A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

Hardware Architecture

Makes AI learn faster and use less power.

6 Dec 2025 0

88%

A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips

Hardware Architecture

Makes AI chips faster and use less power.

17 Aug 2025 0

87%

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Hardware Architecture

Makes AI learn faster and use less power.

9 Oct 2025 0

View PDF Login to Bookmark

Page Count

14 pages

NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

Makes AI learn and think much faster, using less power.

Technical Abstract

A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations