Score: 1

Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries

Published: November 18, 2025 | arXiv ID: 2511.14685v1

By: Kiera McCormick, Rafael Martínez-Galarza

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Teaches computers to understand space science from text.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models have demonstrated the ability to generalize well at many levels across domains, modalities, and even shown in-context learning capabilities. This enables research questions regarding how they can be used to encode physical information that is usually only available from scientific measurements, and loosely encoded in textual descriptions. Using astrophysics as a test bed, we investigate if LLM embeddings can codify physical summary statistics that are obtained from scientific measurements through two main questions: 1) Does prompting play a role on how those quantities are codified by the LLM? and 2) What aspects of language are most important in encoding the physics represented by the measurement? We investigate this using sparse autoencoders that extract interpretable features from the text.

The Empowerment of Science of Science by Large Language Models: New Tools and Methods

Computation and Language

AI helps scientists discover new ideas faster.

19 Nov 2025 1

89%

Uncovering Emergent Physics Representations Learned In-Context by Large Language Models

Computation and Language

Computers learn physics concepts from examples.

17 Aug 2025 0

89%

Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study

Instrumentation and Methods for Astrophysics

Finds space signals better with less data.

3 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

11 pages

Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries

Teaches computers to understand space science from text.

Technical Abstract

The Empowerment of Science of Science by Large Language Models: New Tools and Methods

Uncovering Emergent Physics Representations Learned In-Context by Large Language Models

Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study