AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance
By: Matteo Perotti , Vincenzo Maisto , Moritz Imfeld and more
Potential Business Impact:
Makes computers faster for smart tasks.
Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. This is evidenced by the increasing adoption of vector ISAs, such as Arm's SVE/SVE2 and RISC-V's RVV, not only in high-performance computers but also in embedded systems. The open-source nature of RVV has particularly encouraged the development of numerous vector processor designs across industry and academia. However, despite the growing number of open-source RVV processors, there is a lack of published data on their performance in a complex application environment hosted by a full-fledged operating system (Linux). In this work, we add OS support to the open-source bare-metal Ara2 vector processor (AraOS) by sharing the MMU of CVA6, the scalar core used for instruction dispatch to Ara2, and integrate AraOS into the open-source Cheshire SoC platform. We evaluate the performance overhead of virtual-to-physical address translation by benchmarking matrix multiplication kernels across several problem sizes and translation lookaside buffer (TLB) configurations in CVA6's shared MMU, providing insights into vector performance in a full-system environment with virtual memory. With at least 16 TLB entries, the virtual memory overhead remains below 3.5%. Finally, we benchmark a 2-lane AraOS instance with the open-source RiVEC benchmark suite for RVV architectures, with peak average speedups of 3.2x against scalar-only execution.
Similar Papers
CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V Processor
Hardware Architecture
Makes self-driving cars more reliable and faster.
Efficient Implementation of RISC-V Vector Permutation Instructions
Hardware Architecture
Speeds up computer math and secret codes.
CVA6S+: A Superscalar RISC-V Core with High-Throughput Memory Architecture
Hardware Architecture
Makes computer chips run much faster and better.