Score: 2

Fully Unified Motion Planning for End-to-End Autonomous Driving

Published: April 17, 2025 | arXiv ID: 2504.12667v2

By: Lin Liu , Caiyan Jia , Ziying Song and more

Potential Business Impact:

Teaches self-driving cars to learn from all cars.

Business Areas:
Autonomous Vehicles Transportation

Current end-to-end autonomous driving methods typically learn only from expert planning data collected from a single ego vehicle, severely limiting the diversity of learnable driving policies and scenarios. However, a critical yet overlooked fact is that in any driving scenario, multiple high-quality trajectories from other vehicles coexist with a specific ego vehicle's trajectory. Existing methods fail to fully exploit this valuable resource, missing important opportunities to improve the models' performance (including long-tail scenarios) through learning from other experts. Intuitively, Jointly learning from both ego and other vehicles' expert data is beneficial for planning tasks. However, this joint learning faces two critical challenges. (1) Different scene observation perspectives across vehicles hinder inter-vehicle alignment of scene feature representations; (2) The absence of partial modality in other vehicles' data (e.g., vehicle states) compared to ego-vehicle data introduces learning bias. To address these challenges, we propose FUMP (Fully Unified Motion Planning), a novel two-stage trajectory generation framework. Building upon probabilistic decomposition, we model the planning task as a specialized subtask of motion prediction. Specifically, our approach decouples trajectory planning into two stages. In Stage 1, a shared decoder jointly generates initial trajectories for both tasks. In Stage 2, the model performs planning-specific refinement conditioned on an ego-vehicle's state. The transition between the two stages is bridged by a state predictor trained exclusively on ego-vehicle data. To address the cross-vehicle discrepancy in observational perspectives, we propose an Equivariant Context-Sharing Adapter (ECSA) before Stage 1 for improving cross-vehicle generalization of scene representations.

Country of Origin
πŸ‡ΈπŸ‡¬ πŸ‡¨πŸ‡³ πŸ‡¦πŸ‡Ί Singapore, China, Australia

Page Count
17 pages

Category
Computer Science:
CV and Pattern Recognition