Score: 1

Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries

Published: November 18, 2025 | arXiv ID: 2511.14685v1

By: Kiera McCormick, Rafael Martínez-Galarza

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Teaches computers to understand space science from text.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models have demonstrated the ability to generalize well at many levels across domains, modalities, and even shown in-context learning capabilities. This enables research questions regarding how they can be used to encode physical information that is usually only available from scientific measurements, and loosely encoded in textual descriptions. Using astrophysics as a test bed, we investigate if LLM embeddings can codify physical summary statistics that are obtained from scientific measurements through two main questions: 1) Does prompting play a role on how those quantities are codified by the LLM? and 2) What aspects of language are most important in encoding the physics represented by the measurement? We investigate this using sparse autoencoders that extract interpretable features from the text.

Country of Origin
🇺🇸 United States

Page Count
11 pages

Category
Computer Science:
Computation and Language