Score: 2

Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving

Published: December 15, 2025 | arXiv ID: 2512.13262v1

By: Hyunki Seong , Jeong-Kyun Lee , Heesoo Myeong and more

BigTech Affiliations: Qualcomm

Potential Business Impact:

Makes self-driving cars safer and react better.

Business Areas:

A/B Testing Data and Analytics

Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency and diversity in motion selection. Our Warm-K method-based test-time scaling enhances behavioral consistency and reactivity at test time without retraining, mitigating covariate shift and reducing performance discrepancies. Demo videos are available in the supplementary material.