Score: 0

Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization

Published: December 22, 2025 | arXiv ID: 2512.19270v1

By: Zhaoyang Liu , Weitao Zhou , Junze Wen and more

Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue, we propose an information-theoretic data pruning method that effectively reduces the training data volume without compromising model performance. Our approach evaluates the trajectory distribution information entropy of driving data and iteratively selects high-value samples that preserve the statistical characteristics of the original dataset in a model-agnostic manner. From a theoretical perspective, we show that maximizing trajectory entropy effectively constrains the Kullback-Leibler divergence between the pruned subset and the original data distribution, thereby maintaining generalization ability. Comprehensive experiments on the NuPlan benchmark with a large-scale imitation learning framework demonstrate that the proposed method can reduce the dataset size by up to 40% while maintaining closed-loop performance. This work provides a lightweight and theoretically grounded approach for scalable data management and efficient policy learning in autonomous driving systems.

Trajectory Entropy Reinforcement Learning for Predictable and Robust Control

Machine Learning (CS)

Makes robots move more smoothly and reliably.

7 May 2025 1

87%

A Trajectory Generator for High-Density Traffic and Diverse Agent-Interaction Scenarios

Robotics

Makes self-driving cars safer in busy traffic.

3 Oct 2025 0

87%

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models

CV and Pattern Recognition

Makes AI art generators faster and smaller.

26 Nov 2025 1

View PDF Login to Bookmark

Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization

Technical Abstract

Trajectory Entropy Reinforcement Learning for Predictable and Robust Control

A Trajectory Generator for High-Density Traffic and Diverse Agent-Interaction Scenarios

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models