Achieving 3D Attention via Triplet Squeeze and Excitation Block
By: Maan Alhazmi, Abdulrahman Altahhan
Potential Business Impact:
Helps computers understand emotions from faces better.
The emergence of ConvNeXt and its variants has reaffirmed the conceptual and structural suitability of CNN-based models for vision tasks, re-establishing them as key players in image classification in general, and in facial expression recognition (FER) in particular. In this paper, we propose a new set of models that build on these advancements by incorporating a new set of attention mechanisms that combines Triplet attention with Squeeze-and-Excitation (TripSE) in four different variants. We demonstrate the effectiveness of these variants by applying them to the ResNet18, DenseNet and ConvNext architectures to validate their versatility and impact. Our study shows that incorporating a TripSE block in these CNN models boosts their performances, particularly for the ConvNeXt architecture, indicating its utility. We evaluate the proposed mechanisms and associated models across four datasets, namely CIFAR100, ImageNet, FER2013 and AffectNet datasets, where ConvNext with TripSE achieves state-of-the-art results with an accuracy of \textbf{78.27\%} on the popular FER2013 dataset, a new feat for this dataset.
Similar Papers
Enhancing Deepfake Detection using SE Block Attention with CNN
CV and Pattern Recognition
Finds fake videos using smart, small computer programs.
Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis
CV and Pattern Recognition
Helps doctors spot sickness in medical pictures better.
Lightweight Channel Attention for Efficient CNNs
CV and Pattern Recognition
Makes computer vision faster and smarter.