FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding
By: Xusheng He , Wei Liu , Shanshan Ma and more
Potential Business Impact:
Helps computers understand badminton moves and strategies.
Fine-grained analysis of complex and high-speed sports like badminton presents a significant challenge for Multimodal Large Language Models (MLLMs), despite their notable advancements in general video understanding. This difficulty arises primarily from the scarcity of datasets with sufficiently rich and domain-specific annotations. To bridge this gap, we introduce FineBadminton, a novel and large-scale dataset featuring a unique multi-level semantic annotation hierarchy (Foundational Actions, Tactical Semantics, and Decision Evaluation) for comprehensive badminton understanding. The construction of FineBadminton is powered by an innovative annotation pipeline that synergistically combines MLLM-generated proposals with human refinement. We also present FBBench, a challenging benchmark derived from FineBadminton, to rigorously evaluate MLLMs on nuanced spatio-temporal reasoning and tactical comprehension. Together, FineBadminton and FBBench provide a crucial ecosystem to catalyze research in fine-grained video understanding and advance the development of MLLMs in sports intelligence. Furthermore, we propose an optimized baseline approach incorporating Hit-Centric Keyframe Selection to focus on pivotal moments and Coordinate-Guided Condensation to distill salient visual information. The results on FBBench reveal that while current MLLMs still face significant challenges in deep sports video analysis, our proposed strategies nonetheless achieve substantial performance gains. The project homepage is available at https://finebadminton.github.io/FineBadminton/.
Similar Papers
Bridging the Gap: Doubles Badminton Analysis with Singles-Trained Models
CV and Pattern Recognition
Helps computers understand fast badminton games.
TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?
CV and Pattern Recognition
Helps computers understand fast sports like tennis.
Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding
CV and Pattern Recognition
Explains badminton games by describing shots and tactics.