FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents
By: Zheng Dou , Deqing Wang , Fuzhen Zhuang and more
Potential Business Impact:
Helps computers understand science papers better.
Scientific document representation learning provides powerful embeddings for various tasks, while current methods face challenges across three approaches. 1) Contrastive training with citation-structural signals underutilizes citation information and still generates single-vector representations. 2) Fine-grained representation learning, which generates multiple vectors at the sentence or aspect level, requires costly integration and lacks domain generalization. 3) Task-aware learning depends on manually predefined task categorization, overlooking nuanced task distinctions and requiring extra training data for task-specific modules. To address these problems, we propose a new method that unifies the three approaches for better representations, namely FLeW. Specifically, we introduce a novel triplet sampling method that leverages citation intent and frequency to enhance citation-structural signals for training. Citation intents (background, method, result), aligned with the general structure of scientific writing, facilitate a domain-generalized facet partition for fine-grained representation learning. Then, we adopt a simple weight search to adaptively integrate three facet-level embeddings into a task-specific document embedding without task-aware fine-tuning. Experiments show the applicability and robustness of FLeW across multiple scientific tasks and fields, compared to prior models.
Similar Papers
Citation importance-aware document representation learning for large-scale science mapping
Digital Libraries
Helps map science by understanding important citations.
Rethinking Graph-Based Document Classification: Learning Data-Driven Structures Beyond Heuristic Approaches
Computation and Language
Links sentences automatically to sort documents better
FITRep: Attention-Guided Item Representation via MLLMs
Information Retrieval
Finds and removes nearly identical online items.