Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration
By: Mohamed Shahawy, Julien de Castelnau, Paolo Ienne
Potential Business Impact:
Makes computer programs run much faster on special chips.
Task-level parallelism (TLP) is a widely used approach in software where independent tasks are dynamically created and scheduled at runtime. Recent systems have explored architectural support for TLP on field-programmable gate arrays (FPGAs), often leveraging high-level synthesis (HLS) to create processing elements (PEs). In this paper, we present Bombyx, a compiler toolchain that lowers OpenCilk programs into a Cilk-1-inspired intermediate representation, enabling efficient mapping of CPU-oriented TLP applications to spatial architectures on FPGAs. Unlike OpenCilk's implicit task model, which requires costly context switching in hardware, Cilk-1 adopts explicit continuation-passing - a model that better aligns with the streaming nature of FPGAs. Bombyx supports multiple compilation targets: one is an OpenCilk-compatible runtime for executing Cilk-1-style code using the OpenCilk backend, and another is a synthesizable PE generator designed for HLS tools like Vitis HLS. Additionally, we introduce a decoupled access-execute optimization that enables automatic generation of high-performance PEs, improving memory-compute overlap and overall throughput.
Similar Papers
Understanding Accelerator Compilers via Performance Profiling
Programming Languages
Helps designers find slow parts in computer chips.
An MLIR pipeline for offloading Fortran to FPGAs via OpenMP
Distributed, Parallel, and Cluster Computing
Lets computers speed up tasks using special chips.
FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration
Hardware Architecture
Speeds up computer chip design by 18 times.