Score: 0

Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction

Published: December 19, 2025 | arXiv ID: 2512.17250v1

By: Ziyang Lin , Zixuan Sun , Sanhorn Chen and more

Real-time sequential control agents are often bottlenecked by inference latency. Even modest per-step planning delays can destabilize control and degrade overall performance. We propose a speculation-and-correction framework that adapts the predict-then-verify philosophy of speculative execution to model-based control with TD-MPC2. At each step, a pretrained world model and latent-space MPC planner generate a short-horizon action queue together with predicted latent rollouts, allowing the agent to execute multiple planned actions without immediate replanning. When a new observation arrives, the system measures the mismatch between the encoded real latent state and the queued predicted latent. For small to moderate mismatch, a lightweight learned corrector applies a residual update to the speculative action, distilled offline from a replanning teacher. For large mismatch, the agent safely falls back to full replanning and clears stale action queues. We study both a gated two-tower MLP corrector and a temporal Transformer corrector to address local errors and systematic drift. Experiments on the DMC Humanoid-Walk task show that our method reduces the number of planning inferences from 500 to 282, improves end-to-end step latency by 25 percent, and maintains strong control performance with only a 7.1 percent return reduction. Ablation results demonstrate that speculative execution without correction is unreliable over longer horizons, highlighting the necessity of mismatch-aware correction for robust latency reduction.

InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control

Systems and Control

Lets robots follow human orders better.

8 Apr 2025 1

88%

Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Robotics

Robots understand goals and act better.

7 Apr 2025 2

88%

MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection

Systems and Control

Finds better ways to do tasks by trying all options.

1 Oct 2025 0

View PDF Login to Bookmark

Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction

Technical Abstract

InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control

Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection