IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants
By: Vivek Chavan , Yasmina Imgrund , Tung Dao and more
Potential Business Impact:
Helps robots learn to do factory jobs.
We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering. Baseline evaluations for Mistake Detection, Question Answering and collaborative task understanding show that the dataset presents a challenge for the state-of-the-art multimodal models. Our dataset is available at: https://huggingface.co/datasets/FraunhoferIPK/IndEgo
Similar Papers
OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation
CV and Pattern Recognition
Teaches robots to copy human hand movements.
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
CV and Pattern Recognition
AI learns to help people by watching and listening.
MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction
CV and Pattern Recognition
Records real-life events from multiple viewpoints.