UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting
By: Sehyuk Park, Soyeon Caren Han, Eduard Hovy
Potential Business Impact:
Helps predict future events using pictures and words.
Time series forecasting is a foundational task across domains, such as finance, healthcare, and environmental monitoring. While recent advances in Time Series Foundation Models (TSFMs) have demonstrated strong generalisation through large-scale pretraining, existing models operate predominantly in a unimodal setting, ignoring the rich multimodal context, such as visual and textual signals, that often accompanies time series data in real-world scenarios. This paper introduces a novel parameter-efficient multimodal framework, UniCast, that extends TSFMs to jointly leverage time series, vision, and text modalities for enhanced forecasting performance. Our method integrates modality-specific embeddings from pretrained Vision and Text Encoders with a frozen TSFM via soft prompt tuning, enabling efficient adaptation with minimal parameter updates. This design not only preserves the generalisation strength of the foundation model but also enables effective cross-modal interaction. Extensive experiments across diverse time-series forecasting benchmarks demonstrate that UniCast consistently and significantly outperforms all existing TSFM baselines. The findings highlight the critical role of multimodal context in advancing the next generation of general-purpose time series forecasters.
Similar Papers
DP-GPT4MTS: Dual-Prompt Large Language Model for Textual-Numerical Time Series Forecasting
Computation and Language
Makes predictions better by using news and numbers.
VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones
CV and Pattern Recognition
Helps computers predict future events using pictures.
UniTS: Unified Time Series Generative Model for Remote Sensing
CV and Pattern Recognition
Makes satellite pictures clear and predicts changes.