Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
By: Jing Jie Tan , Anissa Mokraoui , Ban-Hoe Kwan and more
Potential Business Impact:
Makes computers describe blurry pictures better.
Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance, making it an ideal choice for training on resource-constrained scenarios.
Similar Papers
Multilingual Training-Free Remote Sensing Image Captioning
CV and Pattern Recognition
Lets computers describe satellite pictures in any language.
Generating Accurate and Detailed Captions for High-Resolution Images
CV and Pattern Recognition
Makes computer pictures describe more details accurately.
One More Glance with Sharp Eyes: Rethinking Lightweight Captioning as a Practical Visual Specialist
CV and Pattern Recognition
Lets phones describe pictures without the internet.