SQUiD: Synthesizing Relational Databases from Unstructured Text
By: Mushtari Sadia , Zhenning Yang , Yunming Xiao and more
Potential Business Impact:
Turns messy text into organized lists.
Relational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by generating its schema and populating its tables from raw text. We introduce SQUiD, a novel neurosymbolic framework that decomposes this task into four stages, each with specialized techniques. Our experiments show that SQUiD consistently outperforms baselines across diverse datasets.
Similar Papers
Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation
Human-Computer Interaction
Helps computers understand questions about any data.
Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation
Human-Computer Interaction
Helps computers understand database questions faster.
SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation
Artificial Intelligence
Lets computers understand any database questions.