RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL
By: Jeffrey Eben, Aitzaz Ahmad, Stephen Lau
Potential Business Impact:
Lets computers understand huge company data easily.
Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases with varying structure and available metadata. Our solution enables practical text-to-SQL systems deployable across diverse enterprise settings without specialized fine-tuning, addressing a critical scalability gap in natural language database interfaces.
Similar Papers
Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL
Computation and Language
Helps computers find the right data for questions.
A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
Computation and Language
Helps AI tell true facts, not made-up ones.
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Computation and Language
Helps computers solve hard problems by organizing facts.