ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
By: Jiaxu Tian , Xuehui Yu , Yaoxing Wang and more
Potential Business Impact:
Makes computer designs look better and more organized.
Content-aware layout aims to arrange design elements appropriately on a given canvas to convey information effectively. Recently, the trend for this task has been to leverage large language models (LLMs) to generate layouts automatically, achieving remarkable performance. However, existing LLM-based methods fail to adequately interpret spatial relationships among visual themes and design elements, leading to structural and diverse problems in layout generation. To address this issue, we introduce ReLayout, a novel method that leverages relation-CoT to generate more reasonable and aesthetically coherent layouts by fundamentally originating from design concepts. Specifically, we enhance layout annotations by introducing explicit relation definitions, such as region, salient, and margin between elements, with the goal of decomposing the layout into smaller, structured, and recursive layouts, thereby enabling the generation of more structured layouts. Furthermore, based on these defined relationships, we introduce a layout prototype rebalance sampler, which defines layout prototype features across three dimensions and quantifies distinct layout styles. This sampler addresses uniformity issues in generation that arise from data bias in the prototype distribution balance process. Extensive experimental results verify that ReLayout outperforms baselines and can generate structural and diverse layouts that are more aligned with human aesthetics and more explainable.
Similar Papers
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
CV and Pattern Recognition
Makes computer designs look better automatically.
LLMs as Layout Designers: A Spatial Reasoning Perspective
Artificial Intelligence
Helps computers design picture layouts by understanding space.
PARL: Position-Aware Relation Learning Network for Document Layout Analysis
CV and Pattern Recognition
Lets computers understand document layouts without reading text.