Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles
By: Lennard Manuel, Hamid Gadirov, Steffen Frey
Analyzing and visualizing scientific ensemble datasets with high dimensionality and complexity poses significant challenges. Dimensionality reduction techniques and autoencoders are powerful tools for extracting features, but they often struggle with such high-dimensional data. This paper presents an enhanced autoencoder framework that incorporates a clustering loss, based on the soft silhouette score, alongside a contrastive loss to improve the visualization and interpretability of ensemble datasets. First, EfficientNetV2 is used to generate pseudo-labels for the unlabeled portions of the scientific ensemble datasets. By jointly optimizing the reconstruction, clustering, and contrastive objectives, our method encourages similar data points to group together while separating distinct clusters in the latent space. UMAP is subsequently applied to this latent representation to produce 2D projections, which are evaluated using the silhouette score. Multiple types of autoencoders are evaluated and compared based on their ability to extract meaningful features. Experiments on two scientific ensemble datasets - channel structures in soil derived from Markov chain Monte Carlo, and droplet-on-film impact dynamics - show that models incorporating clustering or contrastive loss marginally outperform the baseline approaches.
Similar Papers
Machine Learning for Scientific Visualization: Ensemble Data Analysis
Machine Learning (CS)
Helps scientists understand complex data faster.
Ensemble Visualization With Variational Autoencoder
Machine Learning (CS)
Shows weather patterns more clearly.
Latent space projections and atlases: A cautionary tale in deep neuroimaging using autoencoders
Applications
Finds brain changes linked to Alzheimer's disease.