Text to Sketch Generation with Multi-Styles
By: Tengjie Li, Shikui Tu, Lei Xu
Potential Business Impact:
Draws pictures in any chosen art style.
Recent advances in vision-language models have facilitated progress in sketch generation. However, existing specialized methods primarily focus on generic synthesis and lack mechanisms for precise control over sketch styles. In this work, we propose a training-free framework based on diffusion models that enables explicit style guidance via textual prompts and referenced style sketches. Unlike previous style transfer methods that overwrite key and value matrices in self-attention, we incorporate the reference features as auxiliary information with linear smoothing and leverage a style-content guidance mechanism. This design effectively reduces content leakage from reference sketches and enhances synthesis quality, especially in cases with low structural similarity between reference and target sketches. Furthermore, we extend our framework to support controllable multi-style generation by integrating features from multiple reference sketches, coordinated via a joint AdaIN module. Extensive experiments demonstrate that our approach achieves high-quality sketch generation with accurate style alignment and improved flexibility in style control. The official implementation of M3S is available at https://github.com/CMACH508/M3S.
Similar Papers
Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation
CV and Pattern Recognition
Draw a picture to design a page layout.
SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
CV and Pattern Recognition
Changes text style in pictures without losing meaning.
Leveraging Diffusion Models for Stylization using Multiple Style Images
CV and Pattern Recognition
Changes pictures to look like any art style.