WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation
By: Zishan Shu , Juntong Wu , Wei Yan and more
Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency-from low-frequency global layout to high-frequency edges and textures-is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency-time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in O(N log N) time-far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to 1.6x higher throughput and 30% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics. Codes are available at: https://github.com/ZishanShu/WaveFormer.
Similar Papers
Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion
CV and Pattern Recognition
See hidden objects through walls with radio waves.
Wavefront-Constrained Passive Obscured Object Detection
CV and Pattern Recognition
Sees hidden things in the dark and fog.
WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition
CV and Pattern Recognition
Helps robots understand hand movements better.