PG-HIVE: Hybrid Incremental Schema Discovery for Property Graphs
By: Sofia Sideri , Georgia Troullinou , Elisjana Ymeralli and more
Potential Business Impact:
Finds hidden patterns in connected data.
Property graphs have rapidly become the de facto standard for representing and managing complex, interconnected data, powering applications across domains from knowledge graphs to social networks. Despite the advantages, their schema-free nature poses major challenges for integration, exploration, visualization, and efficient querying. To bridge this gap, we present PG-HIVE, a novel framework for automatic schema discovery in property graphs. PG-HIVE goes beyond existing approaches by uncovering latent node and edge types, inferring property datatypes, constraints, and cardinalities, and doing so even in the absence of explicit labeling information. Leveraging a unique combination of Locality-Sensitive Hashing with property- and label-based clustering, PG-HIVE identifies structural similarities at scale. Moreover, it introduces incremental schema discovery, eliminating costly recomputation as new data arrives. Through extensive experimentation, we demonstrate that PG-HIVE consistently outperforms state-of-the-art solutions, in both accuracy (by up to 65% for nodes and 40% for edges), and efficiency (up to 1.95x faster execution), unlocking the full potential of schema-aware property graph management.
Similar Papers
ZOGRASCOPE: A New Benchmark for Semantic Parsing over Property Graphs
Computation and Language
Helps computers understand complex data graphs.
Contextual Graph Embeddings: Accounting for Data Characteristics in Heterogeneous Data Integration
Databases
Helps computers combine different data faster.
Scalable and Explainable Enterprise Knowledge Discovery Using Graph-Centric Hybrid Retrieval
Artificial Intelligence
Finds answers in company files faster.