Score: 0

Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning

Published: January 3, 2026 | arXiv ID: 2601.01023v1

By: João Morais , Sadjad Alikhani , Akshay Malhotra and more

Potential Business Impact:

Helps wireless devices learn from different data.

Business Areas:

Wireless Hardware, Mobile

This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (sim2real) comparison, task-specific synthetic data generation, and informing decisions on model training/adaptation to new deployments. We evaluate candidate dataset distance metrics by how well they predict cross-dataset transferability: if two datasets have a small distance, a model trained on one should perform well on the other. We apply the framework on an unsupervised task, channel state information (CSI) compression, using autoencoders. Using metrics based on UMAP embeddings, combined with Wasserstein and Euclidean distances, we achieve Pearson correlations exceeding 0.85 between dataset distances and train-on-one/test-on-another task performance. We also apply the framework to a supervised beam prediction in the downlink using convolutional neural networks. For this task, we derive a label-aware distance by integrating supervised UMAP and penalties for dataset imbalance. Across both tasks, the resulting distances outperform traditional baselines and consistently exhibit stronger correlations with model transferability, supporting task-relevant comparisons between wireless datasets.

Measuring Time-Series Dataset Similarity using Wasserstein Distance

Machine Learning (CS)

Finds similar patterns in data over time.

29 Jul 2025 0

88%

Wasserstein distance based semi-supervised manifold learning and application to GNSS multi-path detection

Machine Learning (CS)

Teaches computers to find bad signals with few examples.

5 Dec 2025 0

88%

Statistical Inference for Manifold Similarity and Alignability across Noisy High-Dimensional Datasets

Statistics Theory

Compares complex data by looking at its hidden shapes.

26 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

16 pages

Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning

Helps wireless devices learn from different data.

Technical Abstract

Measuring Time-Series Dataset Similarity using Wasserstein Distance

Wasserstein distance based semi-supervised manifold learning and application to GNSS multi-path detection

Statistical Inference for Manifold Similarity and Alignability across Noisy High-Dimensional Datasets