Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction
By: Djamel Eddine Boukhari
Potential Business Impact:
Makes computers judge faces as pretty or not.
The computational assessment of facial attractiveness, a challenging subjective regression task, is dominated by architectures with a critical trade-off: Convolutional Neural Networks (CNNs) offer efficiency but have limited receptive fields, while Vision Transformers (ViTs) model global context at a quadratic computational cost. To address this, we propose Mamba-CNN, a novel and efficient hybrid architecture. Mamba-CNN integrates a lightweight, Mamba-inspired State Space Model (SSM) gating mechanism into a hierarchical convolutional backbone. This core innovation allows the network to dynamically modulate feature maps and selectively emphasize salient facial features and their long-range spatial relationships, mirroring human holistic perception while maintaining computational efficiency. We conducted extensive experiments on the widely-used SCUT-FBP5500 benchmark, where our model sets a new state-of-the-art. Mamba-CNN achieves a Pearson Correlation (PC) of 0.9187, a Mean Absolute Error (MAE) of 0.2022, and a Root Mean Square Error (RMSE) of 0.2610. Our findings validate the synergistic potential of combining CNNs with selective SSMs and present a powerful new architectural paradigm for nuanced visual understanding tasks.
Similar Papers
VM-BeautyNet: A Synergistic Ensemble of Vision Transformer and Mamba for Facial Beauty Prediction
CV and Pattern Recognition
Makes computers judge faces as beautiful.
VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation
CV and Pattern Recognition
Helps computers see details and the big picture.
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
CV and Pattern Recognition
Lets computers see Earth better from space.