Score: 0

A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

Published: December 6, 2025 | arXiv ID: 2512.06362v1

By: Junyi Yang , Xinyu Luo , Ye Ke and more

Potential Business Impact:

Makes AI learn faster and use less power.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The energy efficiency of analog computing-in-memory (ACIM) accelerator for recurrent neural networks, particularly long short-term memory (LSTM) network, is limited by the high proportion of nonlinear (NL) operations typically executed digitally. To address this, we propose an LSTM accelerator incorporating an ACIM macro with reconfigurable (1-5 bit) nonlinear in-memory (NLIM) analog-to-digital converter (ADC) to compute NL activations directly in the analog domain using: 1) a dual 9T bitcell with decoupled read/write paths for signed inputs and ternary weight operations; 2) a read-word-line underdrive Cascode (RUDC) technique achieving 2.8X higher read-bitline dynamic range than single-transistor designs (1.4X better over conventional Cascode structure with 7X lower current variation); 3) a dual-supply 6T-SRAM array for efficient multi-bit weight operations and reducing both bitcell count (7.8X) and latency (4X) for 5-bit weight operations. We experimentally demonstrate 5-bit NLIM ADC for approximating NL activations in LSTM cells, achieving average error <1 LSB. Simulation confirms the robustness of NLIM ADC against temperature variations thanks to the replica bias strategy. Our design achieves 92.0% on-chip inference accuracy for a 12-class keyword-spotting task while demonstrating 2.2X higher system-level normalized energy efficiency and 1.6X better normalized area efficiency than state-of-the-art works. The results combine physical measurements of a macro unit-accounting for the majority of LSTM operations (99% linear and 80% nonlinear operations)-with simulations of the remaining components, including additional LSTM and fully connected layers.

A Novel 8T SRAM-Based In-Memory Computing Architecture for MAC-Derived Logical Functions

Hardware Architecture

Makes computers do math and logic faster.

29 Nov 2025 0

89%

NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

Hardware Architecture

Makes AI learn and think much faster, using less power.

17 Nov 2025 0

89%

LIMCA: LLM for Automating Analog In-Memory Computing Architecture Design Exploration

Hardware Architecture

Computers design themselves to learn faster.

17 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Page Count

13 pages

A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

Makes AI learn faster and use less power.

Technical Abstract

A Novel 8T SRAM-Based In-Memory Computing Architecture for MAC-Derived Logical Functions

NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

LIMCA: LLM for Automating Analog In-Memory Computing Architecture Design Exploration