A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations
By: Anastasios Petropoulos, Theodore Antonakopoulos
Potential Business Impact:
Makes AI learn faster and use less power.
Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic arrays, high-bandwidth memory, and UltraRAMs. We present two processing unit (PU) configurations with different computing capabilities using the same interfaces and peripheral blocks. By instantiating multiple PUs and employing a heuristic weight transfer schedule, the architecture achieves notable throughput efficiency over prior works. Moreover, we outline how the architecture can be extended to emulate analog in-memory computing (AIMC) devices to aid next-generation heterogeneous AIMC chip designs and investigate device-level noise behavior. Overall, this brief presents a versatile DNN inference acceleration architecture adaptable to various models and future FPGA designs.
Similar Papers
Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference
Hardware Architecture
Speeds up AI by making computer chips work together.
Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
Hardware Architecture
Makes computers faster at science and AI tasks.
A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
Hardware Architecture
Makes AI run faster and use less power.