JEPA for RL: Investigating Joint-Embedding Predictive Architectures for Reinforcement Learning
By: Tristan Kenneweg, Philip Kenneweg, Barbara Hammer
Potential Business Impact:
Teaches robots to learn from watching.
Joint-Embedding Predictive Architectures (JEPA) have recently become popular as promising architectures for self-supervised learning. Vision transformers have been trained using JEPA to produce embeddings from images and videos, which have been shown to be highly suitable for downstream tasks like classification and segmentation. In this paper, we show how to adapt the JEPA architecture to reinforcement learning from images. We discuss model collapse, show how to prevent it, and provide exemplary data on the classical Cart Pole task.
Similar Papers
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
Machine Learning (CS)
Makes AI understand pictures better and more clearly.
Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection
Robotics
Helps self-driving cars see better with less power.
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
CV and Pattern Recognition
Helps computers understand pictures and words better.