Score: 2

IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

Published: November 24, 2025 | arXiv ID: 2511.19684v1

By: Vivek Chavan , Yasmina Imgrund , Tung Dao and more

Potential Business Impact:

Helps robots learn to do factory jobs.

Business Areas:
Image Recognition Data and Analytics, Software

We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering. Baseline evaluations for Mistake Detection, Question Answering and collaborative task understanding show that the dataset presents a challenge for the state-of-the-art multimodal models. Our dataset is available at: https://huggingface.co/datasets/FraunhoferIPK/IndEgo

Repos / Data Links

Page Count
53 pages

Category
Computer Science:
CV and Pattern Recognition