Web-Scale Collection of Video Data for 4D Animal Reconstruction
By: Brian Nlong Zhao, Jiajun Wu, Shangzhe Wu
Potential Business Impact:
Makes 3D animal movies from YouTube videos.
Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-driven approaches show the potential of single-view, non-invasive analysis, yet current animal video datasets are limited--offering as few as 2.4K 15-frame clips and lacking key processing for animal-centric 3D/4D tasks. We introduce an automated pipeline that mines YouTube videos and processes them into object-centric clips, along with auxiliary annotations valuable for downstream tasks like pose estimation, tracking, and 3D/4D reconstruction. Using this pipeline, we amass 30K videos (2M frames)--an order of magnitude more than prior works. To demonstrate its utility, we focus on the 4D quadruped animal reconstruction task. To support this task, we present Animal-in-Motion (AiM), a benchmark of 230 manually filtered sequences with 11K frames showcasing clean, diverse animal motions. We evaluate state-of-the-art model-based and model-free methods on Animal-in-Motion, finding that 2D metrics favor the former despite unrealistic 3D shapes, while the latter yields more natural reconstructions but scores lower--revealing a gap in current evaluation. To address this, we enhance a recent model-free approach with sequence-level optimization, establishing the first 4D animal reconstruction baseline. Together, our pipeline, benchmark, and baseline aim to advance large-scale, markerless 4D animal reconstruction and related tasks from in-the-wild videos. Code and datasets are available at https://github.com/briannlongzhao/Animal-in-Motion.
Similar Papers
4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos
CV and Pattern Recognition
Makes 3D animals move realistically from videos.
Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals
CV and Pattern Recognition
Makes 3D animal models from videos.
Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction
CV and Pattern Recognition
Robots learn to copy human actions from videos.