Score: 2

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning

Published: June 16, 2025 | arXiv ID: 2506.13705v1

By: Junru Zhang , Lang Feng , Xu Guo and more

Potential Business Impact:

Helps computers understand time data better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Time-series reasoning remains a significant challenge in multimodal large language models (MLLMs) due to the dynamic temporal patterns, ambiguous semantics, and lack of temporal priors. In this work, we introduce TimeMaster, a reinforcement learning (RL)-based method that enables time-series MLLMs to perform structured, interpretable reasoning directly over visualized time-series inputs and task prompts. TimeMaster adopts a three-part structured output format, reasoning, classification, and domain-specific extension, and is optimized via a composite reward function that aligns format adherence, prediction accuracy, and open-ended insight quality. The model is trained using a two-stage pipeline: we first apply supervised fine-tuning (SFT) to establish a good initialization, followed by Group Relative Policy Optimization (GRPO) at the token level to enable stable and targeted reward-driven improvement in time-series reasoning. We evaluate TimeMaster on the TimerBed benchmark across six real-world classification tasks based on Qwen2.5-VL-3B-Instruct. TimeMaster achieves state-of-the-art performance, outperforming both classical time-series models and few-shot GPT-4o by over 14.6% and 7.3% performance gain, respectively. Notably, TimeMaster goes beyond time-series classification: it also exhibits expert-like reasoning behavior, generates context-aware explanations, and delivers domain-aligned insights. Our results highlight that reward-driven RL can be a scalable and promising path toward integrating temporal understanding into time-series MLLMs.

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Machine Learning (CS)

Teaches computers to predict future events better.

12 Jun 2025 1

89%

LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization

Machine Learning (CS)

Predicts future data trends more accurately.

11 Mar 2025 2

89%

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

CV and Pattern Recognition

Teaches AI to understand time in videos better.

3 Dec 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

30 pages

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning

Helps computers understand time data better.

Technical Abstract

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning