Unicorn-CIM: Uncovering the Vulnerability and Improving the Resilience of High-Precision Compute-in-Memory
By: Qiufeng Li, Yiwen Liang, Weidong Cao
Potential Business Impact:
Makes AI chips more reliable for complex tasks.
Compute-in-memory (CIM) architecture has been widely explored to address the von Neumann bottleneck in accelerating deep neural networks (DNNs). However, its reliability remains largely understudied, particularly in the emerging domain of floating-point (FP) CIM, which is crucial for speeding up high-precision inference and on device training. This paper introduces Unicorn-CIM, a framework to uncover the vulnerability and improve the resilience of high-precision CIM, built on static random-access memory (SRAM)-based FP CIM architecture. Through the development of fault injection and extensive characterizations across multiple DNNs, Unicorn-CIM reveals how soft errors manifest in FP operations and impact overall model performance. Specifically, we find that high-precision DNNs are extremely sensitive to errors in the exponent part of FP numbers. Building on this insight, Unicorn-CIM develops an efficient algorithm-hardware co-design method that optimizes model exponent distribution through fine-tuning and incorporates a lightweight Error Correcting Code (ECC) scheme to safeguard high-precision DNNs on FP CIM. Comprehensive experiments show that our approach introduces just an 8.98% minimal logic overhead on the exponent processing path while providing robust error protection and maintaining model accuracy. This work paves the way for developing more reliable and efficient CIM hardware.
Similar Papers
SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators
Hardware Architecture
Makes AI chips more reliable against errors.
Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration
Hardware Architecture
Makes AI faster and more accurate on chips.
CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures
Hardware Architecture
Helps computers learn faster by using less energy.