Score: 1

DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting

Published: January 12, 2026 | arXiv ID: 2601.07250v1

By: Mingnan Zhu , Qixuan Zhang , Yixuan Cheng and more

Potential Business Impact:

Predicts energy use more accurately for power grids.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Accurate energy time-series forecasting is crucial for ensuring grid stability and promoting the integration of renewable energy, yet it faces significant challenges from complex temporal dependencies and the heterogeneity of multi-source data. To address these issues, we propose DDT, a novel and robust deep learning framework for high-precision time-series forecasting. At its core, DDT introduces two key innovations. First, we design a dual-masking mechanism that synergistically combines a strict causal mask with a data-driven dynamic mask. This novel design ensures theoretical causal consistency while adaptively focusing on the most salient historical information, overcoming the rigidity of traditional masking techniques. Second, our architecture features a dual-expert system that decouples the modeling of temporal dynamics and cross-variable correlations into parallel, specialized pathways, which are then intelligently integrated through a dynamic gated fusion module. We conducted extensive experiments on 7 challenging energy benchmark datasets, including ETTh, Electricity, and Solar. The results demonstrate that DDT consistently outperforms strong state-of-the-art baselines across all prediction horizons, establishing a new benchmark for the task.

Country of Origin
🇨🇳 China

Page Count
12 pages

Category
Computer Science:
Machine Learning (CS)