OTR: Synthesizing Overlay Text Dataset for Text Removal
By: Jan Zdenek, Wataru Shimoda, Kota Yamaguchi
Potential Business Impact:
Cleans text from pictures better.
Text removal is a crucial task in computer vision with applications such as privacy preservation, image editing, and media reuse. While existing research has primarily focused on scene text removal in natural images, limitations in current datasets hinder out-of-domain generalization or accurate evaluation. In particular, widely used benchmarks such as SCUT-EnsText suffer from ground truth artifacts due to manual editing, overly simplistic text backgrounds, and evaluation metrics that do not capture the quality of generated results. To address these issues, we introduce an approach to synthesizing a text removal benchmark applicable to domains other than scene texts. Our dataset features text rendered on complex backgrounds using object-aware placement and vision-language model-generated content, ensuring clean ground truth and challenging text removal scenarios. The dataset is available at https://huggingface.co/datasets/cyberagent/OTR .
Similar Papers
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
CV and Pattern Recognition
Removes and changes text in pictures perfectly.
What Shape Is Optimal for Masks in Text Removal?
CV and Pattern Recognition
Cleans text from pictures, even messy ones.
TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering
CV and Pattern Recognition
Makes computers change words in pictures better.