Robustness Evaluation for Video Models with Reinforcement Learning
By: Ashwin Ramesh Babu , Sajad Mousavi , Vineet Gundecha and more
Potential Business Impact:
Makes AI video watchers more easily fooled.
Evaluating the robustness of Video classification models is very challenging, specifically when compared to image-based models. With their increased temporal dimension, there is a significant increase in complexity and computational cost. One of the key challenges is to keep the perturbations to a minimum to induce misclassification. In this work, we propose a multi-agent reinforcement learning approach (spatial and temporal) that cooperatively learns to identify the given video's sensitive spatial and temporal regions. The agents consider temporal coherence in generating fine perturbations, leading to a more effective and visually imperceptible attack. Our method outperforms the state-of-the-art solutions on the Lp metric and the average queries. Our method enables custom distortion types, making the robustness evaluation more relevant to the use case. We extensively evaluate 4 popular models for video action recognition on two popular datasets, HMDB-51 and UCF-101.
Similar Papers
Coordinated Robustness Evaluation Framework for Vision-Language Models
CV and Pattern Recognition
Makes AI models fooled by tricky pictures and words.
A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness
Machine Learning (CS)
Finds computer weaknesses before they cause problems.
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
CV and Pattern Recognition
Computers learn to understand videos better.