Score: 2

Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

Published: October 29, 2025 | arXiv ID: 2510.25348v1

By: Jie Peng , Rui Wang , Qiang Wang and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Predicts viral posts accurately, saving time and money.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.

AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models

Social and Information Networks

Predicts how popular online posts will become.

25 Feb 2025 0

87%

Dynamic Network-Based Two-Stage Time Series Forecasting for Affiliate Marketing

Information Retrieval

Helps online sellers know which helpers sell best.

13 Oct 2025 0

87%

Transforming Causality: Transformer-Based Temporal Causal Discovery with Prior Knowledge Integration

Machine Learning (CS)

Finds true causes in messy time data.

21 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

Predicts viral posts accurately, saving time and money.

Technical Abstract

AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models

Dynamic Network-Based Two-Stage Time Series Forecasting for Affiliate Marketing

Transforming Causality: Transformer-Based Temporal Causal Discovery with Prior Knowledge Integration