The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding
By: Luca Rossetto , Werner Bailer , Duc-Tien Dang-Nguyen and more
Potential Business Impact:
Shows how things look from many viewpoints.
Egocentric video has seen increased interest in recent years, as it is used in a range of areas. However, most existing datasets are limited to a single perspective. In this paper, we present the CASTLE 2024 dataset, a multimodal collection containing ego- and exo-centric (i.e., first- and third-person perspective) video and audio from 15 time-aligned sources, as well as other sensor streams and auxiliary data. The dataset was recorded by volunteer participants over four days in a fixed location and includes the point of view of 10 participants, with an additional 5 fixed cameras providing an exocentric perspective. The entire dataset contains over 600 hours of UHD video recorded at 50 frames per second. In contrast to other datasets, CASTLE 2024 does not contain any partial censoring, such as blurred faces or distorted audio. The dataset is available via https://castle-dataset.github.io/.
Similar Papers
MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction
CV and Pattern Recognition
Records real-life events from multiple viewpoints.
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
CV and Pattern Recognition
Helps robots understand where to look and what to say.
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
CV and Pattern Recognition
Helps robots understand what you're pointing at.