Score: 0

Classification of worldwide news articles by perceived quality, 2018-2024

Published: November 20, 2025 | arXiv ID: 2511.16416v1

By: Connor McElroy, Thiago E. A. de Oliveira, Chris Brogly

Potential Business Impact:

Helps computers spot fake news articles.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This study explored whether supervised machine learning and deep learning models can effectively distinguish perceived lower-quality news articles from perceived higher-quality news articles. 3 machine learning classifiers and 3 deep learning models were assessed using a newly created dataset of 1,412,272 English news articles from the Common Crawl over 2018-2024. Expert consensus ratings on 579 source websites were split at the median, creating perceived low and high-quality classes of about 706,000 articles each, with 194 linguistic features per website-level labelled article. Traditional machine learning classifiers such as the Random Forest demonstrated capable performance (0.7355 accuracy, 0.8131 ROC AUC). For deep learning, ModernBERT-large (256 context length) achieved the best performance (0.8744 accuracy; 0.9593 ROC-AUC; 0.8739 F1), followed by DistilBERT-base (512 context length) at 0.8685 accuracy and 0.9554 ROC-AUC. DistilBERT-base (256 context length) reached 0.8478 accuracy and 0.9407 ROC-AUC, while ModernBERT-base (256 context length) attained 0.8569 accuracy and 0.9470 ROC-AUC. These results suggest that the perceived quality of worldwide news articles can be effectively differentiated by traditional CPU-based machine learning classifiers and deep learning classifiers.

Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024

Computation and Language

Finds fake news headlines from real ones.

11 Jun 2025 0

88%

Efficient Extractive Text Summarization for Online News Articles Using Machine Learning

Machine Learning (CS)

Makes news articles shorter and easier to read.

19 Sep 2025 1

87%

Bridging Human and Model Perspectives: A Comparative Analysis of Political Bias Detection in News Media Using Large Language Models

Computation and Language

Helps computers spot fake news bias like people.

18 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Page Count

6 pages

Classification of worldwide news articles by perceived quality, 2018-2024

Helps computers spot fake news articles.

Technical Abstract

Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024

Efficient Extractive Text Summarization for Online News Articles Using Machine Learning

Bridging Human and Model Perspectives: A Comparative Analysis of Political Bias Detection in News Media Using Large Language Models