Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
By: Yuji Sato, Yasunori Ishii, Takayoshi Yamashita
Potential Business Impact:
Predicts future actions by looking forward and backward.
Video-based long-term action anticipation is crucial for early risk detection in areas such as automated driving and robotics. Conventional approaches extract features from past actions using encoders and predict future events with decoders, which limits performance due to their unidirectional nature. These methods struggle to capture semantically distinct sub-actions within a scene. The proposed method, BiAnt, addresses this limitation by combining forward prediction with backward prediction using a large language model. Experimental results on Ego4D demonstrate that BiAnt improves performance in terms of edit distance compared to baseline methods.
Similar Papers
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
CV and Pattern Recognition
Predicts future actions by watching and understanding.
Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation
CV and Pattern Recognition
Predicts your next actions to help proactively
Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025
CV and Pattern Recognition
Predicts future actions by watching and understanding videos.