Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies
By: Mingming Zhang , Na Li , Zhuang Feiqing and more
Potential Business Impact:
Helps online ads make more money for sellers.
With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-value regularized Generative Auto-bidding method. In QGA, we propose to plug a Q-value regularization with double Q-learning strategy into the Decision Transformer backbone. This design enables joint optimization of policy imitation and action-value maximization, allowing the learned bidding policy to both leverage experience from the dataset and alleviate the adverse impact of the suboptimal trajectories. Furthermore, to safely explore the policy space beyond the data distribution, we propose a Q-value guided dual-exploration mechanism, in which the DT model is conditioned on multiple return-to-go targets and locally perturbed actions. This entire exploration process is dynamically guided by the aforementioned Q-value module, which provides principled evaluation for each candidate action. Experiments on public benchmarks and simulation environments demonstrate that QGA consistently achieves superior or highly competitive results compared to existing alternatives. Notably, in large-scale real-world A/B testing, QGA achieves a 3.27% increase in Ad GMV and a 2.49% improvement in Ad ROI.
Similar Papers
Generative Auto-Bidding with Value-Guided Explorations
Machine Learning (CS)
Makes online ads smarter and win more.
Expert-Guided Diffusion Planner for Auto-Bidding
Machine Learning (CS)
Makes online ads more effective, boosting sales.
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Machine Learning (CS)
Helps ads spend money smarter to get more customers.