Exploring Ordinal Bias in Action Recognition for Instructional Videos
By: Joochan Kim, Minjoon Jung, Byoung-Tak Zhang
Potential Business Impact:
Teaches computers to understand videos, not just memorize.
Action recognition models have achieved promising results in understanding instructional videos. However, they often rely on dominant, dataset-specific action sequences rather than true video comprehension, a problem that we define as ordinal bias. To address this issue, we propose two effective video manipulation methods: Action Masking, which masks frames of frequently co-occurring actions, and Sequence Shuffling, which randomizes the order of action segments. Through comprehensive experiments, we demonstrate that current models exhibit significant performance drops when confronted with nonstandard action sequences, underscoring their vulnerability to ordinal bias. Our findings emphasize the importance of rethinking evaluation strategies and developing models capable of generalizing beyond fixed action patterns in diverse instructional videos.
Similar Papers
How to model Human Actions distribution with Event Sequence Data
Machine Learning (CS)
Predicts what happens next in a list of actions.
Can masking background and object reduce static bias for zero-shot action recognition?
CV and Pattern Recognition
Teaches computers to see actions, not just things.
Pose-Aware Weakly-Supervised Action Segmentation
CV and Pattern Recognition
Teaches computers to understand actions in videos.