Score: 3

Game-RL: Synthesizing Verifiable Game Tasks at Scale to Boost VLMs General Reasoning

Published: May 20, 2025 | arXiv ID: 2505.13886v4

By: Jingqi Tong , Jixin Tang , Hangcheng Li and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Teaches computers to understand games and other things.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Real-world vision language reasoning scenarios often include diverse and complex tasks. However, vision language reinforcement learning has primarily focused on a narrow set of tasks (e.g. geometry or chart reasoning), limiting the improvement of Vision Language Models' (VLMs) general reasoning. Therefore, we propose a novel Code2Logic approach, using Large Language Models (LLMs) to synthesize verifiable game reasoning tasks at scale via adapting game code. Using the Code2Logic, we developed the GameQA dataset to train and evaluate VLMs. GameQA is verifiable and scalable, offers controllable difficulty gradation and is diverse with 30 games and 158 tasks. Then we apply Game-RL, which is simple reinforcement learning on GameQA. Surprisingly, despite training solely on game tasks, VLMs demonstrated out of domain generalization, specifically Qwen2.5-VL-7B improving performance by 2.33% across 7 diverse vision-language benchmarks. Our code, dataset and models are available at the GitHub repository.