MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction
By: Jiaxi Wang , Yaosen Min , Xun Zhu and more
Potential Business Impact:
Predicts plastic qualities for better material design.
Polymers, composed of repeating structural units called monomers, are fundamental materials in daily life and industry. Accurate property prediction for polymers is essential for their design, development, and application. However, existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer, since the properties change during the polymerization process. In this study, we propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers and integrates both topological and spatial information for comprehensive modeling. From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences. For MPM, we demonstrate that applying MPM to infinite polymer sequences is equivalent to applying MPM on the induced star-linking graph of monomers. For GAM, we propose to further replace global graph attention with localized graph attention (LGA). Moreover, we show the robustness of the "star linking" strategy through Repeat and Shift Invariance Test (RSIT). Despite its robustness, "star linking" strategy exhibits limitations when monomer side chains contain ring structures, a common characteristic of polymers, as it fails the Weisfeiler-Lehman~(WL) test. To overcome this issue, we propose backbone embedding to enhance the capability of MPM and LGA on infinite polymer sequences. From the spatial perspective, we extract 3D descriptors of repeating monomers to capture spatial information. Finally, we design a cross-modal fusion mechanism to unify the topological and spatial information. Experimental validation across eight diverse polymer property prediction tasks reveals that MIPS achieves state-of-the-art performance.
Similar Papers
Learning Repetition-Invariant Representations for Polymer Informatics
Machine Learning (CS)
Helps computers understand plastic chains of any length.
Benchmarking Large Language Models for Polymer Property Predictions
Computational Engineering, Finance, and Science
Helps computers guess plastic's heat limits.
Stitching Inner Product and Euclidean Metrics for Topology-aware Maximum Inner Product Search
Databases
Finds best matches faster by blending two search methods.