Score: 0

Diffusion Forcing for Multi-Agent Interaction Sequence Modeling

Published: December 19, 2025 | arXiv ID: 2512.17900v1

By: Vongani H. Maluleke , Kie Horiuchi , Lea Wilken and more

Potential Business Impact:

Makes robots dance and box together realistically.

Business Areas:

Motion Capture Media and Entertainment, Video

Understanding and generating multi-person interactions is a fundamental challenge with broad implications for robotics and social computing. While humans naturally coordinate in groups, modeling such interactions remains difficult due to long temporal horizons, strong inter-agent dependencies, and variable group sizes. Existing motion generation methods are largely task-specific and do not generalize to flexible multi-agent generation. We introduce MAGNet (Multi-Agent Diffusion Forcing Transformer), a unified autoregressive diffusion framework for multi-agent motion generation that supports a wide range of interaction tasks through flexible conditioning and sampling. MAGNet performs dyadic prediction, partner inpainting, and full multi-agent motion generation within a single model, and can autoregressively generate ultra-long sequences spanning hundreds of v. Building on Diffusion Forcing, we introduce key modifications that explicitly model inter-agent coupling during autoregressive denoising, enabling coherent coordination across agents. As a result, MAGNet captures both tightly synchronized activities (e.g, dancing, boxing) and loosely structured social interactions. Our approach performs on par with specialized methods on dyadic benchmarks while naturally extending to polyadic scenarios involving three or more interacting people, enabled by a scalable architecture that is agnostic to the number of agents. We refer readers to the supplemental video, where the temporal dynamics and spatial coordination of generated interactions are best appreciated. Project page: https://von31.github.io/MAGNet/

InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

CV and Pattern Recognition

Robots learn to work together from text.

8 Dec 2025 1

88%

Unified Multimodal Diffusion Forcing for Forceful Manipulation

Robotics

Teaches robots to learn from seeing, doing, and feeling.

6 Nov 2025 0

87%

MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer

CV and Pattern Recognition

Lets videos copy other videos' movements perfectly.

8 Dec 2025 0

View PDF Login to Bookmark

Page Count

13 pages

Diffusion Forcing for Multi-Agent Interaction Sequence Modeling

Makes robots dance and box together realistically.

Technical Abstract

InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

Unified Multimodal Diffusion Forcing for Forceful Manipulation

MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer