Enhancing next token prediction based pre-training for jet foundation models
By: Joschka Birk , Anna Hallin , Gregor Kasieczka and more
Potential Business Impact:
Improves AI's ability to guess what comes next.
Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datasets. Here we study multiple improvements to next token prediction, building on the initial work of OmniJet-$α$. Instead of tokenizing particles and subsequently only using the token-ID as the model input for both the generative and the classification task, we adopt a hybrid setup, which allows us to use continuous feature vectors as model input while only using token-IDs in the next token prediction target. Secondly, we explore a combined pre-training strategy that combines masked particle modeling and generative learning objectives. Taken together, these changes greatly improve the performance in downstream classification tasks without any loss in generative performance.
Similar Papers
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Machine Learning (CS)
Helps AI guess better when things are unclear.
Context-level Language Modeling by Learning Predictive Context Embeddings
Computation and Language
Makes AI understand stories better, not just words.
Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap
Computation and Language
Teaches AI to learn faster and better.