Score: 1

LIMO: Low-Power In-Memory-Annealer and Matrix-Multiplication Primitive for Edge Computing

Published: December 29, 2025 | arXiv ID: 2512.23212v1

By: Amod Holla , Sumedh Chatterjee , Sutanu Sen and more

Potential Business Impact:

Finds best routes faster for big problems.

Business Areas:

RISC Hardware

Combinatorial optimization (CO) underpins applications in science and engineering, ranging from logistics to electronic design automation. A classic example is the NP-complete Traveling Salesman Problem (TSP). Finding exact solutions for large-scale TSP instances remains computationally intractable; on von Neumann architectures, such solvers are constrained by the memory wall, incurring compute-memory traffic that grows with instance size. Metaheuristics, such as simulated annealing implemented on compute-in-memory (CiM) architectures, offer a way to mitigate the von Neumann bottleneck. This is accomplished by performing in-memory optimization cycles to rapidly find approximate solutions for TSP instances. Yet this approach suffers from degrading solution quality as instance size increases, owing to inefficient state-space exploration. To address this, we present LIMO, a mixed-signal computational macro that implements an in-memory annealing algorithm with reduced search-space complexity. The annealing process is aided by the stochastic switching of spin-transfer-torque magnetic-tunnel-junctions (STT-MTJs) to escape local minima. For large instances, our macro co-design is complemented by a refinement-based divide-and-conquer algorithm amenable to parallel optimization in a spatial architecture. Consequently, our system comprising several LIMO macros achieves superior solution quality and faster time-to-solution on instances up to 85,900 cities compared to prior hardware annealers. The modularity of our annealing peripherals allows the LIMO macro to be reused for other applications, such as vector-matrix multiplications (VMMs). This enables our architecture to support neural network inference. As an illustration, we show image classification and face detection with software-comparable accuracy, while achieving lower latency and energy consumption than baseline CiM architectures.

Device-Algorithm Co-Design of Ferroelectric Compute-in-Memory In-Situ Annealer for Combinatorial Optimization Problems

Emerging Technologies

Solves hard problems much faster and uses less power.

30 Apr 2025 1

87%

Solving Boolean satisfiability problems with resistive content addressable memories

Emerging Technologies

Solves hard problems much faster and using less power.

13 Jan 2025 1

87%

A digital SRAM-based compute-in-memory macro for weight-stationary dynamic matrix multiplication in Transformer attention score computation

Hardware Architecture

Makes AI faster and use less power.

15 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇳 🇺🇸 United States, India

Page Count

34 pages

LIMO: Low-Power In-Memory-Annealer and Matrix-Multiplication Primitive for Edge Computing

Finds best routes faster for big problems.

Technical Abstract

Device-Algorithm Co-Design of Ferroelectric Compute-in-Memory In-Situ Annealer for Combinatorial Optimization Problems

Solving Boolean satisfiability problems with resistive content addressable memories

A digital SRAM-based compute-in-memory macro for weight-stationary dynamic matrix multiplication in Transformer attention score computation