Dango: A Mixed-Initiative Data Wrangling System using Large Language Model
By: Wei-Hao Chen , Weixi Tong , Amanda Case and more
Potential Business Impact:
Helps computers clean messy data faster.
Data wrangling is a time-consuming and challenging task in a data science pipeline. While many tools have been proposed to automate or facilitate data wrangling, they often misinterpret user intent, especially in complex tasks. We propose Dango, a mixed-initiative multi-agent system for data wrangling. Compared to existing tools, Dango enhances user communication of intent by allowing users to demonstrate on multiple tables and use natural language prompts in a conversation interface, enabling users to clarify their intent by answering LLM-posed multiple-choice clarification questions, and providing multiple forms of feedback such as step-by-step natural language explanations and data provenance to help users evaluate the data wrangling scripts. We conducted a within-subjects user study with 38 participants and demonstrated that Dango's features can significantly improve intent clarification, accuracy, and efficiency in data wrangling. Furthermore, we demonstrated the generalizability of Dango by applying it to a broader set of data wrangling tasks.
Similar Papers
Steering Semantic Data Processing With DocWrangler
Human-Computer Interaction
Helps computers understand messy text better.
Difficulty-Aware Agent Orchestration in LLM-Powered Workflows
Artificial Intelligence
Smart AI chooses the best way to answer questions.
A Multimodal Conversational Agent for Tabular Data Analysis
Artificial Intelligence
Talks to data, answers with charts or words.