Score: 0

Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models

Published: November 6, 2025 | arXiv ID: 2511.04248v1

By: Salma Mekaooui , Hiba Sofyan , Imane Amaaz and more

Potential Business Impact:

Gives better names to computer-found text ideas.

Business Areas:

Text Analytics Data and Analytics, Software

Extracting topics from text has become an essential task, especially with the rapid growth of unstructured textual data. Most existing works rely on highly computational methods to address this challenge. In this paper, we argue that probabilistic and statistical approaches, such as topic modeling (TM), can offer effective alternatives that require fewer computational resources. TM is a statistical method that automatically discovers topics in large collections of unlabeled text; however, it produces topics as distributions of representative words, which often lack clear interpretability. Our objective is to perform topic labeling by assigning meaningful labels to these sets of words. To achieve this without relying on computationally expensive models, we propose a graph-based approach that not only enriches topic words with semantically related terms but also explores the relationships among them. By analyzing these connections within the graph, we derive suitable labels that accurately capture each topic's meaning. We present a comparative study between our proposed method and several benchmarks, including ChatGPT-3.5, across two different datasets. Our method achieved consistently better results than traditional benchmarks in terms of BERTScore and cosine similarity and produced results comparable to ChatGPT-3.5, while remaining computationally efficient. Finally, we discuss future directions for topic labeling and highlight potential research avenues for enhancing interpretability and automation.

TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction

Computation and Language

Finds hidden topics in social media posts.

7 Dec 2025 1

88%

GHTM: A Graph based Hybrid Topic Modeling Approach in Low-Resource Bengali Language

Computation and Language

Finds hidden topics in Bengali text.

1 Aug 2025 0

88%

Topic Modeling as Long-Form Generation: Can Long-Context LLMs revolutionize NTM via Zero-Shot Prompting?

Computation and Language

Lets computers understand what stories are about.

3 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇪 Ireland

Page Count

12 pages

Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models

Gives better names to computer-found text ideas.

Technical Abstract

TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction

GHTM: A Graph based Hybrid Topic Modeling Approach in Low-Resource Bengali Language

Topic Modeling as Long-Form Generation: Can Long-Context LLMs revolutionize NTM via Zero-Shot Prompting?