Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing
By: Yu Zhu
Potential Business Impact:
Tests software better by learning from mistakes.
Ensuring reliability in modern software systems requires rigorous pre-production testing across highly heterogeneous and evolving environments. Because exhaustive evaluation is infeasible, practitioners must decide how to allocate limited testing resources across configurations where failure probabilities may drift over time. Existing combinatorial optimization approaches are static, ad hoc, and poorly suited to such non-stationary settings. We introduce a novel reinforcement learning (RL) framework that recasts configuration allocation as a sequential decision-making problem. Our method is the first to integrate Q-learning with a hybrid reward design that fuses simulated outcomes and real-time feedback, enabling both sample efficiency and robustness. In addition, we develop an adaptive online-offline training scheme that allows the agent to quickly track abrupt probability shifts while maintaining long-run stability. Extensive simulation studies demonstrate that our approach consistently outperforms static and optimization-based baselines, approaching oracle performance. This work establishes RL as a powerful new paradigm for adaptive configuration allocation, advancing beyond traditional methods and offering broad applicability to dynamic testing and resource scheduling domains.
Similar Papers
Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change
Machine Learning (CS)
Helps robots learn new tricks without forgetting old ones.
Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques
Operating Systems
Makes computer storage faster by learning automatically.
Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management
Networking and Internet Architecture
Helps internet networks adapt to changing needs.