GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining
By: Shaoheng Yan, Zian Li, Muhan Zhang
Potential Business Impact:
Helps computers understand how molecules fit together.
The pretraining-and-finetuning paradigm has driven significant advances across domains, such as natural language processing and computer vision, with representative pretraining paradigms such as masked language modeling and next-token prediction. However, in molecular representation learning, the task design remains largely limited to node-level denoising, which is effective at modeling local atomic environments, yet maybe insufficient for capturing the global molecular structure required by graph-level property prediction tasks, such as energy estimation and molecular regression. In this work, we present GeoRecon, a novel graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRecon introduces a graph-level reconstruction task: during pretraining, the model is trained to generate an informative graph representation capable of accurately guiding reconstruction of the molecular geometry. This encourages the model to learn coherent, global structural features rather than isolated atomic details. Without relying on additional supervision or external data, GeoRecon outperforms node-centric baselines on multiple molecular benchmarks (e.g., QM9, MD17), demonstrating the benefit of incorporating graph-level reconstruction for learning more holistic and geometry-aware molecular embeddings.
Similar Papers
MolGA: Molecular Graph Adaptation with Pre-trained 2D Graph Encoder
Machine Learning (CS)
Helps computers understand molecules better for science.
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
Machine Learning (CS)
Helps computers understand math drawings from words.
Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
Machine Learning (CS)
Helps find new medicines even with missing info.