Score: 2

From Simulations to Surveys: Domain Adaptation for Galaxy Observations

Published: November 23, 2025 | arXiv ID: 2511.18590v1

By: Kaley Brauer , Aditya Prasad Dash , Meet J. Vyas and more

Potential Business Impact:

Helps computers tell galaxy shapes from pictures.

Business Areas:
Simulation Software

Large photometric surveys will image billions of galaxies, but we currently lack quick, reliable automated ways to infer their physical properties like morphology, stellar mass, and star formation rates. Simulations provide galaxy images with ground-truth physical labels, but domain shifts in PSF, noise, backgrounds, selection, and label priors degrade transfer to real surveys. We present a preliminary domain adaptation pipeline that trains on simulated TNG50 galaxies and evaluates on real SDSS galaxies with morphology labels (elliptical/spiral/irregular). We train three backbones (CNN, $E(2)$-steerable CNN, ResNet-18) with focal loss and effective-number class weighting, and a feature-level domain loss $L_D$ built from GeomLoss (entropic Sinkhorn OT, energy distance, Gaussian MMD, and related metrics). We show that a combination of these losses with an OT-based "top_$k$ soft matching" loss that focuses $L_D$ on the worst-matched source-target pairs can further enhance domain alignment. With Euclidean distance, scheduled alignment weights, and top-$k$ matching, target accuracy (macro F1) rises from $\sim$46% ($\sim$30%) at no adaptation to $\sim$87% ($\sim$62.6%), with a domain AUC near 0.5, indicating strong latent-space mixing.

Country of Origin
🇮🇳 🇲🇾 🇺🇸 United States, India, Malaysia

Repos / Data Links

Page Count
8 pages

Category
Astrophysics:
Astrophysics of Galaxies