Bias in the Shadows: Explore Shortcuts in Encrypted Network Traffic Classification
By: Chuyi Wang , Xiaohui Xie , Tongze Wang and more
Pre-trained models operating directly on raw bytes have achieved promising performance in encrypted network traffic classification (NTC), but often suffer from shortcut learning-relying on spurious correlations that fail to generalize to real-world data. Existing solutions heavily rely on model-specific interpretation techniques, which lack adaptability and generality across different model architectures and deployment scenarios. In this paper, we propose BiasSeeker, the first semi-automated framework that is both model-agnostic and data-driven for detecting dataset-specific shortcut features in encrypted traffic. By performing statistical correlation analysis directly on raw binary traffic, BiasSeeker identifies spurious or environment-entangled features that may compromise generalization, independent of any classifier. To address the diverse nature of shortcut features, we introduce a systematic categorization and apply category-specific validation strategies that reduce bias while preserving meaningful information. We evaluate BiasSeeker on 19 public datasets across three NTC tasks. By emphasizing context-aware feature selection and dataset-specific diagnosis, BiasSeeker offers a novel perspective for understanding and addressing shortcut learning in encrypted network traffic classification, raising awareness that feature selection should be an intentional and scenario-sensitive step prior to model training.
Similar Papers
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Machine Learning (CS)
Fixes AI mistakes caused by easy answers.
Shortcut Learning Susceptibility in Vision Classifiers
Machine Learning (CS)
Teaches computers to learn real things, not tricks.
Network Traffic Classification Using Self-Supervised Learning and Confident Learning
Networking and Internet Architecture
Helps computers sort internet traffic faster.