Score: 0

Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems

Published: December 23, 2025 | arXiv ID: 2512.20387v1

By: YuChe Hsu , AnJui Wang , TsaiChing Ni and more

Potential Business Impact:

Builds virtual worlds from drawings and words.

Business Areas:
Simulation Software

We propose a Vision-Language Simulation Model (VLSM) that unifies visual and textual understanding to synthesize executable FlexScript from layout sketches and natural-language prompts, enabling cross-modal reasoning for industrial simulation systems. To support this new paradigm, the study constructs the first large-scale dataset for generative digital twins, comprising over 120,000 prompt-sketch-code triplets that enable multimodal learning between textual descriptions, spatial structures, and simulation logic. In parallel, three novel evaluation metrics, Structural Validity Rate (SVR), Parameter Match Rate (PMR), and Execution Success Rate (ESR), are proposed specifically for this task to comprehensively evaluate structural integrity, parameter fidelity, and simulator executability. Through systematic ablation across vision encoders, connectors, and code-pretrained language backbones, the proposed models achieve near-perfect structural accuracy and high execution robustness. This work establishes a foundation for generative digital twins that integrate visual reasoning and language understanding into executable industrial simulation systems.

Country of Origin
🇹🇼 Taiwan, Province of China

Page Count
10 pages

Category
Computer Science:
Artificial Intelligence