An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and Reconfiguration
By: Alaa Mazouz, Duc Han Le, Van-Tam Nguyen
Potential Business Impact:
Makes AI programs run faster and use less power.
We introduce ForgeMorph, a full-stack compiler for adaptive CNN deployment on FPGAs, combining design-time optimization with runtime reconfigurability. At compile time, the NeuroForge engine performs constraint-driven design space exploration, generating RTL mappings that are Pareto-optimal with respect to user-defined latency and resource budgets. Unlike existing FPGA compilers, which rely on static scheduling and manual tuning, NeuroForge leverages analytical performance models and multi-objective genetic algorithms to efficiently search large configuration spaces and propose highly optimized hardware implementations. At runtime, the NeuroMorph module enables dynamic reconfiguration of network width and depth without requiring redeployment. This is made possible by a novel training strategy, DistillCycle, which jointly trains the full model and its subnetworks using hierarchical knowledge distillation. As a result, each execution path maintains accuracy even under aggressive resource and power constraints. We demonstrate Forge-Morph on the Zynq-7100 using custom and benchmark models including MobileNetV2, ResNet-50, SqueezeNet, and YOLOv5. The system achieves up to 50x latency reduction and 32% lower power consumption at runtime, while matching or exceeding the efficiency of state-of-the-art compilers. ForgeMorph offers a unified solution for deployment scenarios that demand flexibility, performance, and hardware efficiency
Similar Papers
Morphling: Fast, Fused, and Flexible GNN Training at Scale
Machine Learning (CS)
Makes computers learn from connected data much faster.
Morphling: Fast, Fused, and Flexible GNN Training at Scale
Machine Learning (CS)
Makes computers learn from connected data much faster.
Morphling: Fast, Fused, and Flexible GNN Training at Scale
Machine Learning (CS)
Makes computers learn from connected data much faster.