Creative4U: MLLMs-based Advertising Creative Image Selector with Comparative Reasoning
By: Yukang Lin , Xiang Zhang , Shichang Jia and more
Potential Business Impact:
Helps pick the best online ads using AI.
Creative image in advertising is the heart and soul of e-commerce platform. An eye-catching creative image can enhance the shopping experience for users, boosting income for advertisers and advertising revenue for platforms. With the advent of AIGC technology, advertisers can produce large quantities of creative images at minimal cost. However, they struggle to assess the creative quality to select. Existing methods primarily focus on creative ranking, which fails to address the need for explainable creative selection. In this work, we propose the first paradigm for explainable creative assessment and selection. Powered by multimodal large language models (MLLMs), our approach integrates the assessment and selection of creative images into a natural language generation task. To facilitate this research, we construct CreativePair, the first comparative reasoning-induced creative dataset featuring 8k annotated image pairs, with each sample including a label indicating which image is superior. Additionally, we introduce Creative4U (pronounced Creative for You), a MLLMs-based creative selector that takes into account users' interests. Through Reason-to-Select RFT, which includes supervised fine-tuning with Chain-of-Thought (CoT-SFT) and Group Relative Policy Optimization (GRPO) based reinforcement learning, Creative4U is able to evaluate and select creative images accurately. Both offline and online experiments demonstrate the effectiveness of our approach. Our code and dataset will be made public to advance research and industrial applications.
Similar Papers
Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
CV and Pattern Recognition
Teaches computers to judge if pictures look good.
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
CV and Pattern Recognition
Makes AI better at changing pictures with words.
MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
CV and Pattern Recognition
Creates personalized pictures from your descriptions.