Score: 2

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Published: July 30, 2025 | arXiv ID: 2507.22827v1

By: Yilei Jiang , Yaozhi Zheng , Yuxuan Wan and more

Potential Business Impact:

Turns screen designs into working computer code.

Business Areas:

Computer Vision Hardware, Software

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While recent large language models (LLMs) have demonstrated progress in text-to-code generation, many existing approaches rely solely on natural language prompts, limiting their effectiveness in capturing spatial layout and visual design intent. In contrast, UI development in practice is inherently multimodal, often starting from visual sketches or mockups. To address this gap, we introduce a modular multi-agent framework that performs UI-to-code generation in three interpretable stages: grounding, planning, and generation. The grounding agent uses a vision-language model to detect and label UI components, the planning agent constructs a hierarchical layout using front-end engineering priors, and the generation agent produces HTML/CSS code via adaptive prompt-based synthesis. This design improves robustness, interpretability, and fidelity over end-to-end black-box methods. Furthermore, we extend the framework into a scalable data engine that automatically produces large-scale image-code pairs. Using these synthetic examples, we fine-tune and reinforce an open-source VLM, yielding notable gains in UI understanding and code quality. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in layout accuracy, structural coherence, and code correctness. Our code is made publicly available at https://github.com/leigest519/ScreenCoder.

DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models

Software Engineering

Makes websites look right and work perfectly.

16 Jun 2025 0

89%

Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

CV and Pattern Recognition

Turns app pictures into working code.

22 Dec 2025 0

89%

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Artificial Intelligence

Makes computers create pictures from code.

27 Oct 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

13 pages

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Turns screen designs into working computer code.

Technical Abstract

DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models

Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence