FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration
By: Mukul Lokhande , Akash Sankhe , S. V. Jaya Chand and more
Potential Business Impact:
Makes tiny computers learn faster and use less power.
The growing demand for low-power and area-efficient TinyML inference on AIoT devices necessitates memory architectures that minimise data movement while sustaining high computational efficiency. This paper presents FERMI-ML, a Flexible and Resource-Efficient Memory-In-Situ (MIS) SRAM macro designed for TinyML acceleration. The proposed 9T XNOR-based RX9T bit-cell integrates a 5T storage cell with a 4T XNOR compute unit, enabling variable-precision MAC and CAM operations within the same array. A 22-transistor (C22T) compressor-tree-based accumulator facilitates logarithmic 1-64-bit MAC computation with reduced delay and power compared to conventional adder trees. The 4 KB macro achieves dual functionality for in-situ computation and CAM-based lookup operations, supporting Posit-4 or FP-4 precision. Post-layout results at 65 nm show operation at 350 MHz with 0.9 V, delivering a throughput of 1.93 TOPS and an energy efficiency of 364 TOPS/W, while maintaining a Quality-of-Result (QoR) above 97.5% with InceptionV4 and ResNet-18. FERMI-ML thus demonstrates a compact, reconfigurable, and energy-aware digital Memory-In-Situ macro capable of supporting mixed-precision TinyML workloads.
Similar Papers
A Novel 8T SRAM-Based In-Memory Computing Architecture for MAC-Derived Logical Functions
Hardware Architecture
Makes computers do math and logic faster.
A Reconfigurable Time-Domain In-Memory Computing Macro using FeFET-Based CAM with Multilevel Delay Calibration in 28 nm CMOS
Emerging Technologies
Makes computers compute faster and use less power.
NVM-in-Cache: Repurposing Commodity 6T SRAM Cache into NVM Analog Processing-in-Memory Engine using a Novel Compute-on-Powerline Scheme
Hardware Architecture
Makes computer chips do math inside their memory.