Score: 2

Universally Converging Representations of Matter Across Scientific Foundation Models

Published: December 3, 2025 | arXiv ID: 2512.03750v1

By: Sathya Edamadaka , Soojung Yang , Ju Li and more

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Models learn a common "language" for matter.

Business Areas:

Nanotechnology Science and Engineering

Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they learn similar internal representations of matter. Understanding their latent structure is essential for building scientific foundation models that generalize reliably beyond their training domains. Although representational convergence has been observed in language and vision, its counterpart in the sciences has not been systematically explored. Here, we show that representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems. Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality. We then show two distinct regimes of scientific models: on inputs similar to those seen during training, high-performing models align closely and weak models diverge into local sub-optima in representation space; on vastly different structures from those seen during training, nearly all models collapse onto a low-information representation, indicating that today's models remain limited by training data and inductive bias and do not yet encode truly universal structure. Our findings establish representational alignment as a quantitative benchmark for foundation-level generality in scientific models. More broadly, our work can track the emergence of universal representations of matter as models scale, and for selecting and distilling models whose learned representations transfer best across modalities, domains of matter, and scientific tasks.

An Evaluation of Representation Learning Methods in Particle Physics Foundation Models

Machine Learning (CS)

Teaches computers to understand tiny particles better.

16 Nov 2025 1

87%

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Machine Learning (CS)

Controls physics simulations by changing AI's thoughts.

25 Nov 2025 0

87%

Cross-Model Semantics in Representation Learning

Machine Learning (CS)

Makes AI models share knowledge better.

5 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

44 pages

Universally Converging Representations of Matter Across Scientific Foundation Models

Models learn a common "language" for matter.

Technical Abstract

An Evaluation of Representation Learning Methods in Particle Physics Foundation Models

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Cross-Model Semantics in Representation Learning