Score: 0

NysX: An Accurate and Energy-Efficient FPGA Accelerator for Hyperdimensional Graph Classification at the Edge

Published: December 8, 2025 | arXiv ID: 2512.08089v1

By: Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Potential Business Impact:

Helps computers learn patterns from data faster.

Business Areas:
GPU Hardware

Real-time, energy-efficient inference on edge devices is essential for graph classification across a range of applications. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that encodes input features into low-precision, high-dimensional vectors with simple element-wise operations, making it well-suited for resource-constrained edge platforms. Recent work enhances HDC accuracy for graph classification via Nyström kernel approximations. Edge acceleration of such methods faces several challenges: (i) redundancy among (landmark) samples selected via uniform sampling, (ii) storing the Nyström projection matrix under limited on-chip memory, (iii) expensive, contention-prone codebook lookups, and (iv) load imbalance due to irregular sparsity in SpMV. To address these challenges, we propose NysX, the first end-to-end FPGA accelerator for Nyström-based HDC graph classification at the edge. NysX integrates four key optimizations: (i) a hybrid landmark selection strategy combining uniform sampling with determinantal point processes (DPPs) to reduce redundancy while improving accuracy; (ii) a streaming architecture for Nyström projection matrix maximizing external memory bandwidth utilization; (iii) a minimal-perfect-hash lookup engine enabling $O(1)$ key-to-index mapping with low on-chip memory overhead; and (iv) sparsity-aware SpMV engines with static load balancing. Together, these innovations enable real-time, energy-efficient inference on resource-constrained platforms. Implemented on an AMD Zynq UltraScale+ (ZCU104) FPGA, NysX achieves $6.85\times$ ($4.32\times$) speedup and $169\times$ ($314\times$) energy efficiency gains over optimized CPU (GPU) baselines, while improving classification accuracy by $3.4\%$ on average across TUDataset benchmarks, a widely used standard for graph classification.

Country of Origin
🇺🇸 United States

Page Count
12 pages

Category
Computer Science:
Hardware Architecture