What Signals Really Matter for Misinformation Tasks? Evaluating Fake-News Detection and Virality Prediction under Real-World Constraints
By: Francesco Paolo Savatteri, Chahan Vidal-Gorène, Florian Cafiero
Potential Business Impact:
Spots fake news and predicts how fast it spreads.
We present an evaluation-driven study of two practical tasks regarding online misinformation: (i) fake-news detection and (ii) virality prediction in the context of operational settings, with the necessity for rapid reaction. Using the EVONS and FakeNewsNet datasets, we compare textual embeddings (RoBERTa; with a control using Mistral) against lightweight numeric features (timing, follower counts, verification, likes) and sequence models (GRU, gating architectures, Transformer encoders). We show that textual content alone is a strong discriminator for fake-news detection, while numeric-only pipelines remain viable when language models are unavailable or compute is constrained. Virality prediction is markedly harder than fake-news detection and is highly sensitive to label construction; in our setup, a median-based ''viral'' split (<50 likes) is pragmatic but underestimates real-world virality, and time-censoring for engagement features is desirable yet difficult under current API limits. Dimensionality-reduction analyses suggest non-linear structure is more informative for virality than for fake-news detection (t-SNE > PCA on numeric features). Swapping RoBERTa for Mistral embeddings yields only modest deltas, leaving conclusions unchanged. We discuss implications for evaluation design and report reproducibility constraints that realistically affect the field. We release splits and code where possible and provide guidance for metric selection.
Similar Papers
Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis
Artificial Intelligence
Predicts if memes will go viral very fast.
Is Less Really More? Fake News Detection with Limited Information
Information Retrieval
Finds fake news using less text.
Simulating Misinformation Propagation in Social Networks using Large Language Models
Social and Information Networks
Finds how fake news spreads and how to stop it.