Score: 1

Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization

Published: May 9, 2025 | arXiv ID: 2505.05911v1

By: Luca Colagrande, Luca Benini

Potential Business Impact:

Makes computer chips work much faster.

Business Areas:
RISC Hardware

Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-efficient accelerator cores for data-parallel processing. Offloading a computation to the many-core acceleration fabric implies synchronization and communication overheads which can hamper overall performance and efficiency, particularly for small and fine-grained parallel tasks. In this work, we present a detailed, cycle-accurate quantitative analysis of the offload overheads on Occamy, an open-source massively parallel RISC-V based heterogeneous MPSoC. We study how the overheads scale with the number of accelerator cores. We explore an approach to drastically reduce these overheads by co-designing the hardware and the offload routines. Notably, we demonstrate that by incorporating multicast capabilities into the Network-on-Chip of a large (200+ cores) accelerator fabric we can improve offloaded application runtimes by as much as 2.3x, restoring more than 70% of the ideally attainable speedups. Finally, we propose a quantitative model to estimate the runtime of selected applications accounting for the offload overheads, with an error consistently below 15%.

Country of Origin
🇨🇭 Switzerland

Repos / Data Links

Page Count
13 pages

Category
Computer Science:
Distributed, Parallel, and Cluster Computing