Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering
By: Kun Zhu , Lizi Liao , Yuxuan Gu and more
Potential Business Impact:
Organizes science papers into helpful, detailed lists.
The rapid growth of scientific literature demands efficient methods to organize and synthesize research findings. Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models (LLMs), often lack coherence and granularity. We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering. Our method leverages LLMs to identify key aspects of each paper (e.g., methodology, dataset, evaluation) and generates aspect-specific paper summaries, which are then encoded and clustered along each aspect to form a coherent hierarchy. In addition, we introduce a new evaluation benchmark of 156 expert-crafted taxonomies encompassing 11.6k papers, providing the first naturally annotated dataset for this task. Experimental results demonstrate that our method significantly outperforms prior approaches, achieving state-of-the-art performance in taxonomy coherence, granularity, and interpretability.
Similar Papers
TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora
Computation and Language
Organizes science papers by how they change.
Transforming Expert Knowledge into Scalable Ontology via Large Language Models
Artificial Intelligence
Helps computers understand and connect different ideas.
A Scalable Unsupervised Framework for multi-aspect labeling of Multilingual and Multi-Domain Review Data
Computation and Language
Helps computers understand reviews in any language.