Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries
By: Kiera McCormick, Rafael Martínez-Galarza
Potential Business Impact:
Teaches computers to understand space science from text.
Large Language Models have demonstrated the ability to generalize well at many levels across domains, modalities, and even shown in-context learning capabilities. This enables research questions regarding how they can be used to encode physical information that is usually only available from scientific measurements, and loosely encoded in textual descriptions. Using astrophysics as a test bed, we investigate if LLM embeddings can codify physical summary statistics that are obtained from scientific measurements through two main questions: 1) Does prompting play a role on how those quantities are codified by the LLM? and 2) What aspects of language are most important in encoding the physics represented by the measurement? We investigate this using sparse autoencoders that extract interpretable features from the text.
Similar Papers
The Empowerment of Science of Science by Large Language Models: New Tools and Methods
Computation and Language
AI helps scientists discover new ideas faster.
Uncovering Emergent Physics Representations Learned In-Context by Large Language Models
Computation and Language
Computers learn physics concepts from examples.
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Instrumentation and Methods for Astrophysics
Finds space signals better with less data.