CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks
By: Lipeng Zu , Hansong Zhou , Yu Qian and more
Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance. To address this issue, we propose CORE, an offline RL framework specifically designed for wireless environments. CORE identifies latent expert trajectories from noisy datasets via behavior embedding clustering, and trains a conditional variational autoencoder with a contrastive objective to separate expert and non-expert behaviors in latent space. Based on the learned representations, CORE constructs compensable rewards that reflect expert-likelihood, effectively guiding policy learning under limited or imperfect supervision. More broadly, this work represents one of the early systematic explorations of offline RL in wireless networking, where prior adoption remains limited. Beyond introducing offline RL techniques to this domain, we further examine intrinsic wireless data characteristics and develop a domain-aligned algorithm that explicitly accounts for their structural properties. While offline RL has not yet been fully established as a standard methodology in the wireless community, our study aims to provide foundational insights and empirical evidence to support its broader acceptance.
Similar Papers
Offline and Distributional Reinforcement Learning for Wireless Communications
Machine Learning (CS)
Makes wireless networks smarter and safer for drones.
Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks
Networking and Internet Architecture
AI helps wireless networks learn and adapt better.
Multi-Agent Reinforcement Learning for Task Offloading in Wireless Edge Networks
Machine Learning (CS)
Helps robots share resources without talking much.