Score: 0

Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction

Published: January 5, 2026 | arXiv ID: 2601.02530v1

By: Zhuoyang Jiang , Yaosen Min , Peiran Jin and more

Potential Business Impact:

Finds new medicines by understanding molecule shapes.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

We present Connection-Aware Motif Sequencing (CamS), a graph-to-sequence representation that enables decoder-only Transformers to learn molecular graphs via standard next-token prediction (NTP). For molecular property prediction, SMILES-based NTP scales well but lacks explicit topology, whereas graph-native masked modeling captures connectivity but risks disrupting the pivotal chemical details (e.g., activity cliffs). CamS bridges this gap by serializing molecular graphs into structure-rich causal sequences. CamS first mines data-driven connection-aware motifs. It then serializes motifs via scaffold-rooted breadth-first search (BFS) to establish a stable core-to-periphery order. Crucially, CamS enables hierarchical modeling by concatenating sequences from fine to coarse motif scales, allowing the model to condition global scaffolds on dense, uncorrupted local structural evidence. We instantiate CamS-LLaMA by pre-training a vanilla LLaMA backbone on CamS sequences. It achieves state-of-the-art performance on MoleculeNet and the activity-cliff benchmark MoleculeACE, outperforming both SMILES-based language models and strong graph baselines. Interpretability analysis confirms that our multi-scale causal serialization effectively drives attention toward cliff-determining differences.

Dual-Modality Representation Learning for Molecular Property Prediction

Machine Learning (CS)

Helps find new medicines faster by combining two ways.

11 Jan 2025 0

88%

M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property Prediction

Machine Learning (CS)

Finds new medicines with less data.

24 Oct 2025 1

87%

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

Machine Learning (CS)

Helps predict how proteins fold to work.

17 Nov 2025 0

View PDF Login to Bookmark

Page Count

24 pages

Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction

Finds new medicines by understanding molecule shapes.

Technical Abstract

Dual-Modality Representation Learning for Molecular Property Prediction

M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property Prediction

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers