Embedding-based Retrieval in Multimodal Content Moderation
By: Hanzhong Liang , Jinghao Shi , Xiang Shen and more
Potential Business Impact:
Finds bad videos faster and cheaper.
Video understanding plays a fundamental role for content moderation on short video platforms, enabling the detection of inappropriate content. While classification remains the dominant approach for content moderation, it often struggles in scenarios requiring rapid and cost-efficient responses, such as trend adaptation and urgent escalations. To address this issue, we introduce an Embedding-Based Retrieval (EBR) method designed to complement traditional classification approaches. We first leverage a Supervised Contrastive Learning (SCL) framework to train a suite of foundation embedding models, including both single-modal and multi-modal architectures. Our models demonstrate superior performance over established contrastive learning methods such as CLIP and MoCo. Building on these embedding models, we design and implement the embedding-based retrieval system that integrates embedding generation and video retrieval to enable efficient and effective trend handling. Comprehensive offline experiments on 25 diverse emerging trends show that EBR improves ROC-AUC from 0.85 to 0.99 and PR-AUC from 0.35 to 0.95. Further online experiments reveal that EBR increases action rates by 10.32% and reduces operational costs by over 80%, while also enhancing interpretability and flexibility compared to classification-based solutions.
Similar Papers
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
CV and Pattern Recognition
Finds specific video moments using smart searching.
CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval
CV and Pattern Recognition
Find videos better using sound, text, and pictures.
RzenEmbed: Towards Comprehensive Multimodal Retrieval
CV and Pattern Recognition
Lets computers understand text, pictures, videos, and documents.