From Simulations to Surveys: Domain Adaptation for Galaxy Observations
By: Kaley Brauer , Aditya Prasad Dash , Meet J. Vyas and more
Potential Business Impact:
Helps computers tell galaxy shapes from pictures.
Large photometric surveys will image billions of galaxies, but we currently lack quick, reliable automated ways to infer their physical properties like morphology, stellar mass, and star formation rates. Simulations provide galaxy images with ground-truth physical labels, but domain shifts in PSF, noise, backgrounds, selection, and label priors degrade transfer to real surveys. We present a preliminary domain adaptation pipeline that trains on simulated TNG50 galaxies and evaluates on real SDSS galaxies with morphology labels (elliptical/spiral/irregular). We train three backbones (CNN, $E(2)$-steerable CNN, ResNet-18) with focal loss and effective-number class weighting, and a feature-level domain loss $L_D$ built from GeomLoss (entropic Sinkhorn OT, energy distance, Gaussian MMD, and related metrics). We show that a combination of these losses with an OT-based "top_$k$ soft matching" loss that focuses $L_D$ on the worst-matched source-target pairs can further enhance domain alignment. With Euclidean distance, scheduled alignment weights, and top-$k$ matching, target accuracy (macro F1) rises from $\sim$46% ($\sim$30%) at no adaptation to $\sim$87% ($\sim$62.6%), with a domain AUC near 0.5, indicating strong latent-space mixing.
Similar Papers
Simulation-Based Pretraining and Domain Adaptation for Astronomical Time Series with Minimal Labeled Data
Instrumentation and Methods for Astrophysics
Teaches computers to find space objects with fake data.
Investigation on deep learning-based galaxy image translation models
Instrumentation and Methods for Astrophysics
Helps computers guess galaxy distance from pictures.
Test-Time Modification: Inverse Domain Transformation for Robust Perception
CV and Pattern Recognition
Makes AI see in new places without retraining.