HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs
By: Yongkang Xiao, Rui Zhang
Potential Business Impact:
Helps computers understand pictures and words to find missing facts.
Multimodal knowledge graphs (MMKGs) enrich traditional knowledge graphs (KGs) by incorporating diverse modalities such as images and text. multimodal knowledge graph completion (MMKGC) seeks to exploit these heterogeneous signals to infer missing facts, thereby mitigating the intrinsic incompleteness of MMKGs. Existing MMKGC methods typically leverage only the information contained in the MMKGs under the closed-world assumption and adopt discriminative training objectives, which limits their reasoning capacity during completion. Recent large language models (LLMs), empowered by massive parameter scales and pretraining on vast corpora, have demonstrated strong reasoning abilities across various tasks. However, their potential in MMKGC remains largely unexplored. To bridge this gap, we propose HERGC, a flexible Heterogeneous Experts Representation and Generative Completion framework for MMKGs. HERGC first deploys a Heterogeneous Experts Representation Retriever that enriches and fuses multimodal information and retrieves a compact candidate set for each incomplete triple. It then uses a Generative LLM Predictor, implemented via either in-context learning or lightweight fine-tuning, to accurately identify the correct answer from these candidates. Extensive experiments on three standard MMKG benchmarks demonstrate HERGC's effectiveness and robustness, achieving superior performance over existing methods.
Similar Papers
ELMM: Efficient Lightweight Multimodal Large Language Models for Multimodal Knowledge Graph Completion
Artificial Intelligence
Helps computers understand pictures and words better.
DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains
Artificial Intelligence
Helps computers understand facts by seeing connections.
Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning
CV and Pattern Recognition
Keeps computer vision smart on old and new tasks.