Exploring Topic Trends in COVID-19 Research Literature using Non-Negative Matrix Factorization
By: Divya Patel , Vansh Parikh , Om Patel and more
Potential Business Impact:
Finds patterns in COVID-19 research papers.
In this work, we apply topic modeling using Non-Negative Matrix Factorization (NMF) on the COVID-19 Open Research Dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19 research literature. NMF factorizes the document-term matrix into two non-negative matrices, effectively representing the topics and their distribution across the documents. This helps us see how strongly documents relate to topics and how topics relate to words. We describe the complete methodology which involves a series of rigorous pre-processing steps to standardize the available text data while preserving the context of phrases, and subsequently feature extraction using the term frequency-inverse document frequency (tf-idf), which assigns weights to words based on their frequency and rarity in the dataset. To ensure the robustness of our topic model, we conduct a stability analysis. This process assesses the stability scores of the NMF topic model for different numbers of topics, enabling us to select the optimal number of topics for our analysis. Through our analysis, we track the evolution of topics over time within the CORD-19 dataset. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape, providing a valuable resource for future research in this field.
Similar Papers
Dynamic Topic Analysis in Academic Journals using Convex Non-negative Matrix Factorization Method
Information Retrieval
Helps computers track how science ideas change.
Applying non-negative matrix factorization with covariates to multivariate time series data as a vector autoregression model
Methodology
Finds hidden patterns in changing data.
Testing Hypotheses of Covariate Effects on Topics of Discourse
Methodology
Finds patterns in lots of text faster.