Schemora: schema matching via multi-stage recommendation and metadata enrichment using off-the-shelf llms
By: Osman Erman Gungor, Derak Paulsen, William Kang
Potential Business Impact:
Connects different computer data easily.
Schema matching is essential for integrating heterogeneous data sources and enhancing dataset discovery, yet it remains a complex and resource-intensive problem. We introduce SCHEMORA, a schema matching framework that combines large language models with hybrid retrieval techniques in a prompt-based approach, enabling efficient identification of candidate matches without relying on labeled training data or exhaustive pairwise comparisons. By enriching schema metadata and leveraging both vector-based and lexical retrieval, SCHEMORA improves matching accuracy and scalability. Evaluated on the MIMIC-OMOP benchmark, it establishes new state-of-the-art performance, with gains of 7.49% in HitRate@5 and 3.75% in HitRate@3 over previous best results. To our knowledge, this is the first LLM-based schema matching method with an open-source implementation, accompanied by analysis that underscores the critical role of retrieval and provides practical guidance on model selection.
Similar Papers
LLMATCH: A Unified Schema Matching Framework with Large Language Models
Databases
Connects different computer data sets more easily.
SMoG: Schema Matching on Graph
Artificial Intelligence
Connects different health records accurately and fast.
Schema Generation for Large Knowledge Graphs Using Large Language Models
Artificial Intelligence
Helps computers build knowledge maps automatically.