Learning to play: A Multimodal Agent for 3D Game-Play
By: Yuguang Yue , Irakli Salia , Samuel Hunt and more
Potential Business Impact:
Lets computers play video games by reading instructions.
We argue that 3-D first-person video games are a challenging environment for real-time multi-modal reasoning. We first describe our dataset of human game-play, collected across a large variety of 3-D first-person games, which is both substantially larger and more diverse compared to prior publicly disclosed datasets, and contains text instructions. We demonstrate that we can learn an inverse dynamics model from this dataset, which allows us to impute actions on a much larger dataset of publicly available videos of human game play that lack recorded actions. We then train a text-conditioned agent for game playing using behavior cloning, with a custom architecture capable of realtime inference on a consumer GPU. We show the resulting model is capable of playing a variety of 3-D games and responding to text input. Finally, we outline some of the remaining challenges such as long-horizon tasks and quantitative evaluation across a large set of games.
Similar Papers
Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments
Machine Learning (CS)
Creates game players with many different, controllable personalities.
Play to Generalize: Learning to Reason Through Game Play
CV and Pattern Recognition
Teaches AI to think better by playing games.
Towards Robust Multimodal Learning in the Open World
Machine Learning (CS)
Helps AI understand the real world better.