Score: 0

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML

Published: May 16, 2025 | arXiv ID: 2505.11483v2

By: Zhaolan Huang, Emmanuel Baccelli

Potential Business Impact:

Makes tiny computers run AI using half the memory.

Business Areas:

Image Recognition Data and Analytics, Software

AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must remain small to fit real-time constraints. An approach to tackle this is patch-based fusion, which aims to optimize data flows across neural network layers. In this paper, we introduce msf-CNN, a novel technique that efficiently finds optimal fusion settings for convolutional neural networks (CNNs) by walking through the fusion solution space represented as a directed acyclic graph. Compared to previous work on CNN fusion for MCUs, msf-CNN identifies a wider set of solutions. We published an implementation of msf-CNN running on various microcontrollers (ARM Cortex-M, RISC-V, ESP32). We show that msf-CNN can achieve inference using 50% less RAM compared to the prior art (MCUNetV2 and StreamNet). We thus demonstrate how msf-CNN offers additional flexibility for system designers.

On-Sensor Convolutional Neural Networks with Early-Exits

Machine Learning (CS)

Makes smart sensors use less power.

21 Mar 2025 1

86%

Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Systems and Control

Saves power by skipping unneeded computer math.

7 Nov 2025 2

86%

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Machine Learning (CS)

Makes small computers run smart programs faster.

8 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

16 pages

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML

Makes tiny computers run AI using half the memory.

Technical Abstract

On-Sensor Convolutional Neural Networks with Early-Exits

Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers