SVD: Spatial Video Dataset
By: M. H. Izadimehr , Milad Ghanbari , Guodong Chen and more
Potential Business Impact:
Makes 3D videos easier to create and use.
Stereoscopic video has long been the subject of research due to its capacity to deliver immersive three-dimensional content across a wide range of applications, from virtual and augmented reality to advanced human-computer interaction. The dual-view format inherently provides binocular disparity cues that enhance depth perception and realism, making it indispensable for fields such as telepresence, 3D mapping, and robotic vision. Until recently, however, end-to-end pipelines for capturing, encoding, and viewing high-quality 3D video were neither widely accessible nor optimized for consumer-grade devices. Today's smartphones, such as the iPhone Pro, and modern Head-Mounted Displays (HMDs), like the Apple Vision Pro (AVP), offer built-in support for stereoscopic video capture, hardware-accelerated encoding, and seamless playback on devices like the Apple Vision Pro and Meta Quest 3, requiring minimal user intervention. Apple refers to this streamlined workflow as spatial video. Making the full stereoscopic video process available to everyone has made new applications possible. Despite these advances, there remains a notable absence of publicly available datasets that include the complete spatial video pipeline. In this paper, we introduce SVD, a spatial video dataset comprising 300 five-second video sequences, 150 captured using an iPhone Pro and 150 with an AVP. Additionally, 10 longer videos with a minimum duration of 2 minutes have been recorded. The SVD dataset is publicly released under an open-access license to facilitate research in codec performance evaluation, subjective and objective quality of experience (QoE) assessment, depth-based computer vision, stereoscopic video streaming, and other emerging 3D applications such as neural rendering and volumetric capture. Link to the dataset: https://cd-athena.github.io/SVD/
Similar Papers
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
CV and Pattern Recognition
Teaches computers to understand 3D worlds from videos.
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
CV and Pattern Recognition
Makes 3D objects move realistically from videos.
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
CV and Pattern Recognition
Creates realistic virtual worlds you can move in.