Score: 1

Multilingual Multimodal Software Developer for Code Generation

Published: July 11, 2025 | arXiv ID: 2507.08719v1

By: Linzheng Chai , Jian Yang , Shukai Liu and more

Potential Business Impact:

Helps computers write code from pictures.

Business Areas:

Simulation Software

The rapid advancement of Large Language Models (LLMs) has significantly improved code generation, yet most models remain text-only, neglecting crucial visual aids like diagrams and flowcharts used in real-world software development. To bridge this gap, we introduce MM-Coder, a Multilingual Multimodal software developer. MM-Coder integrates visual design inputs-Unified Modeling Language (UML) diagrams and flowcharts (termed Visual Workflow)-with textual instructions to enhance code generation accuracy and architectural alignment. To enable this, we developed MMc-Instruct, a diverse multimodal instruction-tuning dataset including visual-workflow-based code generation, allowing MM-Coder to synthesize textual and graphical information like human developers, distinct from prior work on narrow tasks. Furthermore, we introduce MMEval, a new benchmark for evaluating multimodal code generation, addressing existing text-only limitations. Our evaluations using MMEval highlight significant remaining challenges for models in precise visual information capture, instruction following, and advanced programming knowledge. Our work aims to revolutionize industrial programming by enabling LLMs to interpret and implement complex specifications conveyed through both text and visual designs.

Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

Software Engineering

Turns software pictures into working computer code.

15 Mar 2025 1

89%

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Computation and Language

Helps computers write code from pictures.

13 Aug 2025 2

88%

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

CV and Pattern Recognition

Teaches computers to solve math problems with pictures.

15 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

48 pages

Multilingual Multimodal Software Developer for Code Generation

Helps computers write code from pictures.

Technical Abstract

Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning