LLMATCH: A Unified Schema Matching Framework with Large Language Models
By: Sha Wang , Yuchen Li , Hanhua Xiao and more
Potential Business Impact:
Connects different computer data sets more easily.
Schema matching is a foundational task in enterprise data integration, aiming to align disparate data sources. While traditional methods handle simple one-to-one table mappings, they often struggle with complex multi-table schema matching in real-world applications. We present LLMatch, a unified and modular schema matching framework. LLMatch decomposes schema matching into three distinct stages: schema preparation, table-candidate selection, and column-level alignment, enabling component-level evaluation and future-proof compatibility. It includes a novel two-stage optimization strategy: a Rollup module that consolidates semantically related columns into higher-order concepts, followed by a Drilldown module that re-expands these concepts for fine-grained column mapping. To address the scarcity of complex semantic matching benchmarks, we introduce SchemaNet, a benchmark derived from real-world schema pairs across three enterprise domains, designed to capture the challenges of multi-table schema alignment in practical settings. Experiments demonstrate that LLMatch significantly improves matching accuracy in complex schema matching settings and substantially boosts engineer productivity in real-world data integration.
Similar Papers
Schemora: schema matching via multi-stage recommendation and metadata enrichment using off-the-shelf llms
Databases
Connects different computer data easily.
Structured Multi-Step Reasoning for Entity Matching Using Large Language Model
Databases
Helps computers find matching information faster.
SMoG: Schema Matching on Graph
Artificial Intelligence
Connects different health records accurately and fast.