Score: 0

Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Published: April 18, 2025 | arXiv ID: 2504.13700v1

By: Zhen Wen , Luoxuan Weng , Yinghao Tang and more

Potential Business Impact:

Draw pictures to help computers make charts.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in large language models (LLMs) have shown great potential in automating the process of visualization authoring through simple natural language utterances. However, instructing LLMs using natural language is limited in precision and expressiveness for conveying visualization intent, leading to misinterpretation and time-consuming iterations. To address these limitations, we conduct an empirical study to understand how LLMs interpret ambiguous or incomplete text prompts in the context of visualization authoring, and the conditions making LLMs misinterpret user intent. Informed by the findings, we introduce visual prompts as a complementary input modality to text prompts, which help clarify user intent and improve LLMs' interpretation abilities. To explore the potential of multimodal prompting in visualization authoring, we design VisPilot, which enables users to easily create visualizations using multimodal prompts, including text, sketches, and direct manipulations on existing visualizations. Through two case studies and a controlled user study, we demonstrate that VisPilot provides a more intuitive way to create visualizations without affecting the overall task efficiency compared to text-only prompting approaches. Furthermore, we analyze the impact of text and visual prompts in different visualization tasks. Our findings highlight the importance of multimodal prompting in improving the usability of LLMs for visualization authoring. We discuss design implications for future visualization systems and provide insights into how multimodal prompts can enhance human-AI collaboration in creative visualization tasks. All materials are available at https://OSF.IO/2QRAK.

The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Artificial Intelligence

Teaches AI to understand pictures and words better.

14 Apr 2025 0

90%

Words That Make Language Models Perceive

Computation and Language

Makes text-only AI "see" and "hear" with words.

2 Oct 2025 2

90%

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

CV and Pattern Recognition

Helps computers understand pictures better by combining words and images.

25 Mar 2025 3

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Draw pictures to help computers make charts.

Technical Abstract

The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Words That Make Language Models Perceive

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation