Unveiling the Underwater World: CLIP Perception Model-Guided Underwater Image Enhancement
By: Jiangzhong Cao , Zekai Zeng , Xu Zhang and more
Potential Business Impact:
Makes underwater pictures look clear and real.
High-quality underwater images are essential for both machine vision tasks and viewers with their aesthetic appeal.However, the quality of underwater images is severely affected by light absorption and scattering. Deep learning-based methods for Underwater Image Enhancement (UIE) have achieved good performance. However, these methods often overlook considering human perception and lack sufficient constraints within the solution space. Consequently, the enhanced images often suffer from diminished perceptual quality or poor content restoration.To address these issues, we propose a UIE method with a Contrastive Language-Image Pre-Training (CLIP) perception loss module and curriculum contrastive regularization. Above all, to develop a perception model for underwater images that more aligns with human visual perception, the visual semantic feature extraction capability of the CLIP model is leveraged to learn an appropriate prompt pair to map and evaluate the quality of underwater images. This CLIP perception model is then incorporated as a perception loss module into the enhancement network to improve the perceptual quality of enhanced images. Furthermore, the CLIP perception model is integrated with the curriculum contrastive regularization to enhance the constraints imposed on the enhanced images within the CLIP perceptual space, mitigating the risk of both under-enhancement and over-enhancement. Specifically, the CLIP perception model is employed to assess and categorize the learning difficulty level of negatives in the regularization process, ensuring comprehensive and nuanced utilization of distorted images and negatives with varied quality levels. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in terms of visual quality and generalization ability.
Similar Papers
Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks
CV and Pattern Recognition
Lets computers understand how people feel about pictures.
Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization
Machine Learning (CS)
Lets computers share ideas without needing to train them.
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
CV and Pattern Recognition
Makes pictures useful for people and computers.