From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization
By: Xiaoyu Tao , Shilong Zhang , Mingyue Cheng and more
Potential Business Impact:
Predicts future events by turning numbers into words.
Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, an LLM-driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained large language model (LLM), further optimized with autoregressive generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on diverse real-world datasets enriched with contextual features demonstrate the effectiveness and generalizability of TokenCast.
Similar Papers
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Machine Learning (CS)
Helps computers predict future events better.
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Machine Learning (CS)
Helps computers predict future events better.
From Time and Place to Preference: LLM-Driven Geo-Temporal Context in Recommendations
Information Retrieval
Helps movie suggestions understand holidays and seasons.