GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation
By: Weiqi Yang , Xu Zhou , Jingfu Guan and more
Potential Business Impact:
Helps smart devices understand everything even with missing info.
Multi-modal fusion is crucial for Internet of Things (IoT) perception, widely deployed in smart homes, intelligent transport, industrial automation, and healthcare. However, existing systems often face challenges: high model complexity hinders deployment in resource-constrained environments, unidirectional modal alignment neglects inter-modal relationships, and robustness suffers when sensor data is missing. These issues impede efficient and robust multimodal perception in real-world IoT settings. To overcome these limitations, we propose GRAM-MAMBA. This framework utilizes the linear-complexity Mamba model for efficient sensor time-series processing, combined with an optimized GRAM matrix strategy for pairwise alignment among modalities, addressing the shortcomings of traditional single-modality alignment. Inspired by Low-Rank Adaptation (LoRA), we introduce an adaptive low-rank layer compensation strategy to handle missing modalities post-training. This strategy freezes the pre-trained model core and irrelevant adaptive layers, fine-tuning only those related to available modalities and the fusion process. Extensive experiments validate GRAM-MAMBA's effectiveness. On the SPAWC2021 indoor positioning dataset, the pre-trained model shows lower error than baselines; adapting to missing modalities yields a 24.5% performance boost by training less than 0.2% of parameters. On the USC-HAD human activity recognition dataset, it achieves 93.55% F1 and 93.81% Overall Accuracy (OA), outperforming prior work; the update strategy increases F1 by 23% while training less than 0.3% of parameters. These results highlight GRAM-MAMBA's potential for achieving efficient and robust multimodal perception in resource-constrained environments.
Similar Papers
Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation
Information Retrieval
Helps online stores show you better stuff.
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
Robotics
Helps robots grab and move things better.
MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement
CV and Pattern Recognition
Makes blurry satellite pictures sharp and clear.