Detecting Text Manipulation in Images using Vision Language Models
By: Vidit Vidit , Pavel Korshunov , Amir Mohammadi and more
Potential Business Impact:
Finds fake writing in pictures better.
Recent works have shown the effectiveness of Large Vision Language Models (VLMs or LVLMs) in image manipulation detection. However, text manipulation detection is largely missing in these studies. We bridge this knowledge gap by analyzing closed- and open-source VLMs on different text manipulation datasets. Our results suggest that open-source models are getting closer, but still behind closed-source ones like GPT- 4o. Additionally, we benchmark image manipulation detection-specific VLMs for text manipulation detection and show that they suffer from the generalization problem. We benchmark VLMs for manipulations done on in-the-wild scene texts and on fantasy ID cards, where the latter mimic a challenging real-world misuse.
Similar Papers
Zero-shot image privacy classification with Vision-Language Models
CV and Pattern Recognition
Makes computers better at guessing private pictures.
On the Limitations of Vision-Language Models in Understanding Image Transforms
CV and Pattern Recognition
Teaches computers to understand image changes better.
Object Detection with Multimodal Large Vision-Language Models: An In-depth Review
CV and Pattern Recognition
Lets computers see and understand pictures better.