Score: 1

WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation

Published: October 17, 2025 | arXiv ID: 2510.15306v1

By: Kuang-Da Wang , Zhao Wang , Yotaro Shimose and more

BigTech Affiliations: Sony PlayStation

Potential Business Impact:

Builds websites from simple text instructions.

Business Areas:

Semantic Web Internet Services

Witnessed by the recent advancements on leveraging LLM for coding and multimodal understanding, we present WebGen-V, a new benchmark and framework for instruction-to-HTML generation that enhances both data quality and evaluation granularity. WebGen-V contributes three key innovations: (1) an unbounded and extensible agentic crawling framework that continuously collects real-world webpages and can leveraged to augment existing benchmarks; (2) a structured, section-wise data representation that integrates metadata, localized UI screenshots, and JSON-formatted text and image assets, explicit alignment between content, layout, and visual components for detailed multimodal supervision; and (3) a section-level multimodal evaluation protocol aligning text, layout, and visuals for high-granularity assessment. Experiments with state-of-the-art LLMs and ablation studies validate the effectiveness of our structured data and section-wise evaluation, as well as the contribution of each component. To the best of our knowledge, WebGen-V is the first work to enable high-granularity agentic crawling and evaluation for instruction-to-HTML generation, providing a unified pipeline from real-world data acquisition and webpage generation to structured multimodal assessment.

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch

Computation and Language

Helps computers build websites from simple instructions.

6 May 2025 1

88%

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

Computation and Language

Tests AI's ability to build websites.

9 Jun 2025 1

88%

MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation

CV and Pattern Recognition

Helps AI create realistic medical images for doctors.

17 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Page Count

22 pages

WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation

Builds websites from simple text instructions.

Technical Abstract

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation