Score: 1

Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine

Published: November 17, 2025 | arXiv ID: 2511.13713v1

By: Xincheng Shuai , Zhenyuan Qin , Henghui Ding and more

Potential Business Impact:

Changes objects in pictures like a 3D model.

Business Areas:

Image Recognition Data and Analytics, Software

Recent advances in text-to-image (T2I) diffusion models have significantly improved semantic image editing, yet most methods fall short in performing 3D-aware object manipulation. In this work, we present FFSE, a 3D-aware autoregressive framework designed to enable intuitive, physically-consistent object editing directly on real-world images. Unlike previous approaches that either operate in image space or require slow and error-prone 3D reconstruction, FFSE models editing as a sequence of learned 3D transformations, allowing users to perform arbitrary manipulations, such as translation, scaling, and rotation, while preserving realistic background effects (e.g., shadows, reflections) and maintaining global scene consistency across multiple editing rounds. To support learning of multi-round 3D-aware object manipulation, we introduce 3DObjectEditor, a hybrid dataset constructed from simulated editing sequences across diverse objects and scenes, enabling effective training under multi-round and dynamic conditions. Extensive experiments show that the proposed FFSE significantly outperforms existing methods in both single-round and multi-round 3D-aware editing scenarios.

Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation

CV and Pattern Recognition

Makes self-driving cars practice tricky situations safely.

28 Aug 2025 0

88%

Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

CV and Pattern Recognition

Changes 3D videos with just words.

30 Nov 2025 0

88%

TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing

CV and Pattern Recognition

Changes text in pictures, keeping them real.

17 Nov 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

19 pages

Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine

Changes objects in pictures like a 3D model.

Technical Abstract

Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation

Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing