Score: 2

Automated Research Article Classification and Recommendation Using NLP and ML

Published: October 7, 2025 | arXiv ID: 2510.05495v1

By: Shadikur Rahman , Hasibul Karim Shanto , Umme Ayman Koana and more

Potential Business Impact:

Finds important science papers faster.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In the digital era, the exponential growth of scientific publications has made it increasingly difficult for researchers to efficiently identify and access relevant work. This paper presents an automated framework for research article classification and recommendation that leverages Natural Language Processing (NLP) techniques and machine learning. Using a large-scale arXiv.org dataset spanning more than three decades, we evaluate multiple feature extraction approaches (TF--IDF, Count Vectorizer, Sentence-BERT, USE, Mirror-BERT) in combination with diverse machine learning classifiers (Logistic Regression, SVM, Na\"ive Bayes, Random Forest, Gradient Boosted Trees, and k-Nearest Neighbour). Our experiments show that Logistic Regression with TF--IDF consistently yields the best classification performance, achieving an accuracy of 69\%. To complement classification, we incorporate a recommendation module based on the cosine similarity of vectorized articles, enabling efficient retrieval of related research papers. The proposed system directly addresses the challenge of information overload in digital libraries and demonstrates a scalable, data-driven solution to support literature discovery.

Country of Origin
🇨🇦 Canada


Page Count
8 pages

Category
Computer Science:
Information Retrieval