Automated Research Article Classification and Recommendation Using NLP and ML
By: Shadikur Rahman , Hasibul Karim Shanto , Umme Ayman Koana and more
Potential Business Impact:
Finds important science papers faster.
In the digital era, the exponential growth of scientific publications has made it increasingly difficult for researchers to efficiently identify and access relevant work. This paper presents an automated framework for research article classification and recommendation that leverages Natural Language Processing (NLP) techniques and machine learning. Using a large-scale arXiv.org dataset spanning more than three decades, we evaluate multiple feature extraction approaches (TF--IDF, Count Vectorizer, Sentence-BERT, USE, Mirror-BERT) in combination with diverse machine learning classifiers (Logistic Regression, SVM, Na\"ive Bayes, Random Forest, Gradient Boosted Trees, and k-Nearest Neighbour). Our experiments show that Logistic Regression with TF--IDF consistently yields the best classification performance, achieving an accuracy of 69\%. To complement classification, we incorporate a recommendation module based on the cosine similarity of vectorized articles, enabling efficient retrieval of related research papers. The proposed system directly addresses the challenge of information overload in digital libraries and demonstrates a scalable, data-driven solution to support literature discovery.
Similar Papers
Efficient Extractive Text Summarization for Online News Articles Using Machine Learning
Machine Learning (CS)
Makes news articles shorter and easier to read.
Exploring new Approaches for Information Retrieval through Natural Language Processing
Information Retrieval
Helps computers find information in text faster.
Academic Literature Recommendation in Large-scale Citation Networks Enhanced by Large Language Models
Applications
Finds the best science papers for researchers.