RealProbe: An Automated and Lightweight Performance Profiler for In-FPGA Execution of High-Level Synthesis Designs
By: Jiho Kim, Cong Hao
Potential Business Impact:
Finds slow parts in computer chips automatically.
High-level synthesis (HLS) accelerates FPGA design by rapidly generating diverse implementations using optimization directives. However, even with cycle-accurate C/RTL co-simulation, the reported clock cycles often differ significantly from actual FPGA performance. This discrepancy hampers accurate bottleneck identification, leading to suboptimal design choices. Existing in-FPGA profiling tools, such as the Integrated Logic Analyzer (ILA), require tedious inspection of HLS-generated RTL and manual signal monitoring, reducing productivity. To address these challenges, we introduce RealProbe, the first fully automated, lightweight in-FPGA profiling tool for HLS designs. With a single directive--#pragma HLS RealProbe--the tool automatically generates all necessary code to profile cycle counts across the full function hierarchy, including submodules and loops. RealProbe extracts, records, and visualizes cycle counts with high precision, providing actionable insights into on-board performance. RealProbe is non-intrusive, implemented as independent logic to ensure minimal impact on kernel functionality or timing. It also supports automated design space exploration (DSE), optimizing resource allocation based on FPGA constraints and module complexity. By leveraging incremental synthesis and implementation, DSE runs independently of the original HLS kernel. Evaluated across 28 diverse test cases, including a large-scale design, RealProbe achieves 100% accuracy in capturing cycle counts with minimal logic overhead-just 16.98% LUTs, 43.15% FFs, and 0% BRAM usage. The tool, with full documentation and examples, is available on GitHub at https://github.com/sharc-lab/RealProbe .
Similar Papers
SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS
Hardware Architecture
Tracks computer chip performance automatically.
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
Hardware Architecture
Helps computers design better chips faster.
TimelyHLS: LLM-Based Timing-Aware and Architecture-Specific FPGA HLS Optimization
Cryptography and Security
Makes computer chips faster and smaller automatically.