Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction
By: Lei Hei , Tingjing Liao , Yingxin Pei and more
Potential Business Impact:
Finds hidden connections in text and pictures.
Relation extraction (RE) aims to identify semantic relations between entities in unstructured text. Although recent work extends traditional RE to multimodal scenarios, most approaches still adopt classification-based paradigms with fused multimodal features, representing relations as discrete labels. This paradigm has two significant limitations: (1) it overlooks structural constraints like entity types and positional cues, and (2) it lacks semantic expressiveness for fine-grained relation understanding. We propose \underline{R}etrieval \underline{O}ver \underline{C}lassification (ROC), a novel framework that reformulates multimodal RE as a retrieval task driven by relation semantics. ROC integrates entity type and positional information through a multimodal encoder, expands relation labels into natural language descriptions using a large language model, and aligns entity-relation pairs via semantic similarity-based contrastive learning. Experiments show that our method achieves state-of-the-art performance on the benchmark datasets MNRE and MORE and exhibits stronger robustness and interpretability.
Similar Papers
Relation Extraction with Instance-Adapted Predicate Descriptions
Computation and Language
Finds important facts in text faster.
Multimodal Representation Learning Conditioned on Semantic Relations
Machine Learning (CS)
Teaches computers to understand images and words better.
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CV and Pattern Recognition
Finds information using pictures and words together.