Score: 0

LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

Published: December 17, 2025 | arXiv ID: 2512.15038v1

By: Jihao Huang , Xi Xia , Zhiyuan Li and more

End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a quadratic attention cost, limiting their ability to model long spatial and temporal sequences-particularly on resource-constrained edge platforms. As autonomous driving inherently demands efficient temporal modeling, this challenge severely limits their deployment and real-time performance. Recently, linear attention mechanisms have gained increasing attention due to their superior spatiotemporal complexity. However, existing linear attention architectures are limited to self-attention, lacking support for cross-modal and cross-temporal interactions-both crucial for autonomous driving. In this work, we propose LADY, the first fully linear attention-based generative model for end-to-end autonomous driving. LADY enables fusion of long-range temporal context at inference with constant computational and memory costs, regardless of the history length of camera and LiDAR features. Additionally, we introduce a lightweight linear cross-attention mechanism that enables effective cross-modal information exchange. Experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that LADY achieves state-of-the-art performance with constant-time and memory complexity, offering improved planning performance and significantly reduced computational cost. Additionally, the model has been deployed and validated on edge devices, demonstrating its practicality in resource-limited scenarios.

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Robotics

Teaches self-driving cars to handle tricky situations.

25 Nov 2025 0

87%

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

CV and Pattern Recognition

Teaches cars where drivers look with less data.

16 Nov 2025 0

87%

VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments

CV and Pattern Recognition

Predicts where drivers look using words.

7 Aug 2025 0

View PDF Login to Bookmark

LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

Technical Abstract

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments