Score: 0

Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems

Published: December 23, 2025 | arXiv ID: 2512.20387v1

By: YuChe Hsu , AnJui Wang , TsaiChing Ni and more

Potential Business Impact:

Builds virtual worlds from drawings and words.

Business Areas:

Simulation Software

We propose a Vision-Language Simulation Model (VLSM) that unifies visual and textual understanding to synthesize executable FlexScript from layout sketches and natural-language prompts, enabling cross-modal reasoning for industrial simulation systems. To support this new paradigm, the study constructs the first large-scale dataset for generative digital twins, comprising over 120,000 prompt-sketch-code triplets that enable multimodal learning between textual descriptions, spatial structures, and simulation logic. In parallel, three novel evaluation metrics, Structural Validity Rate (SVR), Parameter Match Rate (PMR), and Execution Success Rate (ESR), are proposed specifically for this task to comprehensively evaluate structural integrity, parameter fidelity, and simulator executability. Through systematic ablation across vision encoders, connectors, and code-pretrained language backbones, the proposed models achieve near-perfect structural accuracy and high execution robustness. This work establishes a foundation for generative digital twins that integrate visual reasoning and language understanding into executable industrial simulation systems.

Synthesizing Visual Concepts as Vision-Language Programs

Artificial Intelligence

Makes AI understand pictures and think logically.

24 Nov 2025 0

89%

Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation

CV and Pattern Recognition

Makes 3D pictures match words better.

18 Nov 2025 1

89%

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Robotics

Robots learn to explore and do tasks better.

16 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Page Count

10 pages

Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems

Builds virtual worlds from drawings and words.

Technical Abstract

Synthesizing Visual Concepts as Vision-Language Programs

Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models