Score: 0

A Hybrid Framework for Subject Analysis: Integrating Embedding-Based Regression Models with Large Language Models

Published: July 19, 2025 | arXiv ID: 2507.22913v1

By: Jinyu Liu , Xiaoying Song , Diana Zhang and more

Potential Business Impact:

Helps libraries find books by topic better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Providing subject access to information resources is an essential function of any library management system. Large language models (LLMs) have been widely used in classification and summarization tasks, but their capability to perform subject analysis is underexplored. Multi-label classification with traditional machine learning (ML) models has been used for subject analysis but struggles with unseen cases. LLMs offer an alternative but often over-generate and hallucinate. Therefore, we propose a hybrid framework that integrates embedding-based ML models with LLMs. This approach uses ML models to (1) predict the optimal number of LCSH labels to guide LLM predictions and (2) post-edit the predicted terms with actual LCSH terms to mitigate hallucinations. We experimented with LLMs and the hybrid framework to predict the subject terms of books using the Library of Congress Subject Headings (LCSH). Experiment results show that providing initial predictions to guide LLM generations and imposing post-edits result in more controlled and vocabulary-aligned outputs.

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

Computation and Language

Helps find science papers by understanding changing topics.

11 Feb 2025 1

90%

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

Computation and Language

Helps computers group words by meaning better.

4 Dec 2025 1

90%

Concept Navigation and Classification via Open-Source Large Language Model Processing

Computation and Language

Finds hidden ideas and stories in text.

7 Feb 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

A Hybrid Framework for Subject Analysis: Integrating Embedding-Based Regression Models with Large Language Models

Helps libraries find books by topic better.

Technical Abstract

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

Concept Navigation and Classification via Open-Source Large Language Model Processing