Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction
By: Makoto Takamoto , Daniel Oñoro-Rubio , Wiem Ben Rim and more
Potential Business Impact:
Makes computer knowledge more accurate and useful.
Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.
Similar Papers
Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models
Machine Learning (CS)
Makes computer knowledge graphs smarter with better fake facts.
Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings
Computation and Language
Makes computers understand words better.
Diffusion-based Hierarchical Negative Sampling for Multimodal Knowledge Graph Completion
Artificial Intelligence
Teaches computers to fill in missing facts better.