Unique Lives, Shared World: Learning from Single-Life Videos
By: Tengda Han , Sayna Ebrahimi , Dilara Gokay and more
Potential Business Impact:
Teaches computers to see like a person.
We introduce the "single-life" learning paradigm, where we train a distinct vision model exclusively on egocentric videos captured by one individual. We leverage the multiple viewpoints naturally captured within a single life to learn a visual encoder in a self-supervised manner. Our experiments demonstrate three key findings. First, models trained independently on different lives develop a highly aligned geometric understanding. We demonstrate this by training visual encoders on distinct datasets each capturing a different life, both indoors and outdoors, as well as introducing a novel cross-attention-based metric to quantify the functional alignment of the internal representations developed by different models. Second, we show that single-life models learn generalizable geometric representations that effectively transfer to downstream tasks, such as depth estimation, in unseen environments. Third, we demonstrate that training on up to 30 hours from one week of the same person's life leads to comparable performance to training on 30 hours of diverse web data, highlighting the strength of single-life representation learning. Overall, our results establish that the shared structure of the world, both leads to consistency in models trained on individual lives, and provides a powerful signal for visual representation learning.
Similar Papers
IC-World: In-Context Generation for Shared World Modeling
CV and Pattern Recognition
Creates consistent 3D worlds from many pictures.
A solution to generalized learning from small training sets found in everyday infant experiences
CV and Pattern Recognition
Teaches computers to learn like babies.
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
CV and Pattern Recognition
AI learns to help people by watching and listening.