Score: 2

SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Published: November 23, 2025 | arXiv ID: 2512.00059v1

By: Swastik Bhattacharya , Sanjay Das , Anand Menon and more

BigTech Affiliations: Intel

Potential Business Impact:

Makes AI chips more reliable against errors.

Business Areas:

Intelligent Systems Artificial Intelligence, Data and Analytics, Science and Engineering

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these parameters efficiently in traditional accelerators is limited by data-transmission bottlenecks, motivating Compute-in-Memory (CiM) architectures that integrate computation within or near memory to reduce data movement. Recent work has explored CiM designs using Floating-Point (FP) and Integer (INT) operations. FP computations typically deliver higher output quality due to their wider dynamic range and precision, benefiting precision-sensitive Generative AI applications. These include models such as LLMs, thus driving advancements in FP-CiM accelerators. However, the vulnerability of FP-CiM to hardware faults remains underexplored, posing a major reliability concern in mission-critical settings. To address this gap, we systematically analyze hardware fault effects in FP-CiM by introducing bit-flip faults at key computational stages, including digital multipliers, CiM memory cells, and digital adder trees. Experiments with Convolutional Neural Networks (CNNs) such as AlexNet and state-of-the-art LLMs including LLaMA-3.2-1B and Qwen-0.3B-Base reveal how faults at each stage affect inference accuracy. Notably, a single adder fault can reduce LLM accuracy to 0%. Based on these insights, we propose a fault-resilient design, SafeCiM, that mitigates fault impact far better than a naive FP-CiM with a pre-alignment stage. For example, with 4096 MAC units, SafeCiM reduces accuracy degradation by up to 49x for a single adder fault compared to the baseline FP-CiM architecture.

Weight Transformations in Bit-Sliced Crossbar Arrays for Fault Tolerant Computing-in-Memory: Design Techniques and Evaluation Framework

Hardware Architecture

Fixes computer chips that make AI mistakes.

20 Dec 2025 1

90%

Unicorn-CIM: Uncovering the Vulnerability and Improving the Resilience of High-Precision Compute-in-Memory

Hardware Architecture

Makes AI chips more reliable for complex tasks.

2 Jun 2025 1

87%

Special Session: Sustainable Deployment of Deep Neural Networks on Non-Volatile Compute-in-Memory Accelerators

Hardware Architecture

Makes AI chips work better and last longer.

17 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Makes AI chips more reliable against errors.

Technical Abstract

Weight Transformations in Bit-Sliced Crossbar Arrays for Fault Tolerant Computing-in-Memory: Design Techniques and Evaluation Framework

Unicorn-CIM: Uncovering the Vulnerability and Improving the Resilience of High-Precision Compute-in-Memory

Special Session: Sustainable Deployment of Deep Neural Networks on Non-Volatile Compute-in-Memory Accelerators