Enhancing Wireless Networks for IoT with Large Vision Models: Foundations and Applications
By: Yunting Xu , Jiacheng Wang , Ruichen Zhang and more
Potential Business Impact:
Helps drones see and talk to each other better.
Large vision models (LVMs) have emerged as a foundational paradigm in visual intelligence, achieving state-of-the-art performance across diverse visual tasks. Recent advances in LVMs have facilitated their integration into Internet of Things (IoT) scenarios, offering superior generalization and adaptability for vision-assisted network optimization. In this paper, we first investigate the functionalities and core architectures of LVMs, highlighting their capabilities across classification, segmentation, generation, and multimodal visual processing. We then explore a variety of LVM applications in wireless communications, covering representative tasks across the physical layer, network layer, and application layer. Furthermore, given the substantial model size of LVMs and the challenges of model retraining in wireless domains, we propose a progressive fine-tuning framework that incrementally adapts pretrained LVMs for joint optimization of multiple IoT tasks. A case study in low-altitude economy networks (LAENets) demonstrates the effectiveness of the proposed framework over conventional CNNs in joint beamforming and positioning tasks for Internet of drones, underscoring a promising direction for integrating LVMs into intelligent wireless systems.
Similar Papers
Vision-Language Models for Edge Networks: A Comprehensive Survey
CV and Pattern Recognition
Makes smart AI work on small, cheap devices.
Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation
CV and Pattern Recognition
Makes computers create clearer pictures from words.
Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization
Machine Learning (CS)
Drones understand what they see and send info fast.