Score: 0

SpatialLock: Precise Spatial Control in Text-to-Image Synthesis

Published: November 6, 2025 | arXiv ID: 2511.04112v1

By: Biao Liu, Yuanzhi Liang

Potential Business Impact:

Puts pictures exactly where you want them.

Business Areas:

Indoor Positioning Navigation and Mapping

Text-to-Image (T2I) synthesis has made significant advancements in recent years, driving applications such as generating datasets automatically. However, precise control over object localization in generated images remains a challenge. Existing methods fail to fully utilize positional information, leading to an inadequate understanding of object spatial layouts. To address this issue, we propose SpatialLock, a novel framework that leverages perception signals and grounding information to jointly control the generation of spatial locations. SpatialLock incorporates two components: Position-Engaged Injection (PoI) and Position-Guided Learning (PoG). PoI directly integrates spatial information through an attention layer, encouraging the model to learn the grounding information effectively. PoG employs perception-based supervision to further refine object localization. Together, these components enable the model to generate objects with precise spatial arrangements and improve the visual quality of the generated images. Experiments show that SpatialLock sets a new state-of-the-art for precise object positioning, achieving IOU scores above 0.9 across multiple datasets.

InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models

CV and Pattern Recognition

Makes AI pictures match words better.

19 Dec 2025 1

87%

Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement

CV and Pattern Recognition

Makes AI draw pictures from words and shapes.

2 Jan 2025 1

87%

A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models

CV and Pattern Recognition

Makes AI draw pictures with exact objects.

10 Nov 2025 1

View PDF Login to Bookmark

Page Count

12 pages

SpatialLock: Precise Spatial Control in Text-to-Image Synthesis

Puts pictures exactly where you want them.

Technical Abstract

InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models

Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement

A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models