Score: 1

A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment

Published: August 2, 2025 | arXiv ID: 2508.01180v1

By: Bowen Wang , Marco Bertuletti , Yichao Zhang and more

Potential Business Impact:

Lets computers learn faster by managing data better.

Attention-based models demand flexible hardware to manage diverse kernels with varying arithmetic intensities and memory access patterns. Large clusters with shared L1 memory, a common architectural pattern, struggle to fully utilize their processing elements (PEs) when scaled up due to reduced throughput in the hierarchical PE-to-L1 intra-cluster interconnect. This paper presents Dynamic Allocation Scheme (DAS), a runtime programmable address remapping hardware unit coupled with a unified memory allocator, designed to minimize data access contention of PEs onto the multi-banked L1. We evaluated DAS on an aggressively scaled-up 1024-PE RISC-V cluster with Non-Uniform Memory Access (NUMA) PE-to-L1 interconnect to demonstrate its potential for improving data locality in large parallel machine learning workloads. For a Vision Transformer (ViT)-L/16 model, each encoder layer executes in 5.67 ms, achieving a 1.94x speedup over the fixed word-level interleaved baseline with 0.81 PE utilization. Implemented in 12nm FinFET technology, DAS incurs <0.1 % area overhead.

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Hardware Architecture

Makes AI faster by sharing computer memory.

8 Dec 2025 0

87%

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks

Computation and Language

Helps understand how computer brains think.

13 Mar 2025 2

86%

Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration

Distributed, Parallel, and Cluster Computing

Makes smart devices run AI faster and use less power.

19 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇭 Switzerland

Repos / Data Links

github.com

Page Count

8 pages

A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment

Lets computers learn faster by managing data better.

Technical Abstract

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks

Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration