Score: 1

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Published: November 9, 2025 | arXiv ID: 2511.06251v1

By: Mingde Xu , Zhen Yang , Wenyi Hong and more

Potential Business Impact:

Makes websites interactive from design pictures.

Business Areas:

Semantic Web Internet Services

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

Human-Computer Interaction

Lets websites talk directly to AI helpers.

14 Nov 2025 0

88%

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

CV and Pattern Recognition

Turns screen designs into working computer code.

30 Jul 2025 2

87%

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

CV and Pattern Recognition

Helps computers learn to use programs like people.

19 Mar 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇨🇦 China, Canada

Page Count

36 pages

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Makes websites interactive from design pictures.

Technical Abstract

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction