Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis
By: Youssef Ghallab , Omar Iraqy , Mohamed Kandil and more
Physiological signals such as electrocardiograms (ECG) and electroencephalograms (EEG) provide complementary insights into human health and cognition, yet multi-modal integration is challenging due to limited multi-modal labeled data, and modality-specific differences . In this work, we adapt the CBraMod encoder for large-scale self-supervised ECG pretraining, introducing a dual-masking strategy to capture intra- and inter-lead dependencies. To overcome the above challenges, we utilize a pre-trained CBraMod encoder for EEG and pre-train a symmetric ECG encoder, equipping each modality with a rich foundational representation. These representations are then fused via simple embedding concatenation, allowing the classification head to learn cross-modal interactions, together enabling effective downstream learning despite limited multi-modal supervision. Evaluated on emotion recognition, our approach achieves near state-of-the-art performance, demonstrating that carefully designed physiological encoders, even with straightforward fusion, substantially improve downstream performance. These results highlight the potential of foundation-model approaches to harness the holistic nature of physiological signals, enabling scalable, label-efficient, and generalizable solutions for healthcare and affective computing.
Similar Papers
Transferring Clinical Knowledge into ECGs Representation
Machine Learning (CS)
Makes heart monitors understand sickness better.
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Signal Processing
Helps computers understand body signals, even if some are missing.
Leveraging Generic Time Series Foundation Models for EEG Classification
Machine Learning (CS)
Helps understand brain signals better.