HERMES: High-Performance RISC-V Memory Hierarchy for ML Workloads
By: Pranav Suryadevara
Potential Business Impact:
Speeds up computer learning by fixing memory.
The growth of machine learning (ML) workloads has underscored the importance of efficient memory hierarchies to address bandwidth, latency, and scalability challenges. HERMES focuses on optimizing memory subsystems for RISC-V architectures to meet the computational needs of ML models such as CNNs, RNNs, and Transformers. This project explores state-of-the-art techniques such as advanced prefetching, tensor-aware caching, and hybrid memory models. The cornerstone of HERMES is the integration of shared L3 caches with fine-grained coherence protocols equipped with specialized pathways to deep-learning accelerators such as Gemmini. Simulation tools like Gem5 and DRAMSim2 were used to evaluate baseline performance and scalability under representative ML workloads. The findings of this study highlight the design choices, and the anticipated challenges, paving the way for low-latency scalable memory operations for ML applications.
Similar Papers
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Hardware Architecture
Helps AI understand complex tasks faster.
Hardware-based Heterogeneous Memory Management for Large Language Model Inference
Hardware Architecture
Makes AI models run faster on less memory.
Synergizing Monetization, Orchestration, and Semantics in Computing Continuum
Distributed, Parallel, and Cluster Computing
Connects all computers, from big to small, for better apps.