Temporal Triplane Transformers as Occupancy World Models
By: Haoran Xu , Peixi Peng , Guang Tan and more
Potential Business Impact:
Helps self-driving cars predict the future faster.
World models aim to learn or construct representations of the environment that enable the prediction of future scenes, thereby supporting intelligent motion planning. However, existing models often struggle to produce fine-grained predictions and to operate in real time. In this work, we propose T$^3$Former, a novel 4D occupancy world model for autonomous driving. T$^3$Former begins by pre-training a compact {\em triplane} representation that efficiently encodes 3D occupancy. It then extracts multi-scale temporal motion features from historical triplanes and employs an autoregressive approach to iteratively predict future triplane changes. Finally, these triplane changes are combined with previous states to decode future occupancy and ego-motion trajectories. Experimental results show that T$^3$Former achieves 1.44$\times$ speedup (26 FPS), improves mean IoU to 36.09, and reduces mean absolute planning error to 1.0 meters. Demos are available in the supplementary material.
Similar Papers
Occupancy World Model for Robots
CV and Pattern Recognition
Helps robots predict what's next in indoor rooms.
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction
CV and Pattern Recognition
Helps self-driving cars see shapes better, faster.
OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction
CV and Pattern Recognition
Makes robots predict and move in 3D worlds.