MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator
By: Peiqing Yang , Shangchen Zhou , Kai Hao and more
Potential Business Impact:
Makes computer-cutouts of people in videos perfect.
Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision often leads to segmentation-like mattes lacking fine details. To this end, we introduce a learned Matting Quality Evaluator (MQE) that assesses semantic and boundary quality of alpha mattes without ground truth. It produces a pixel-wise evaluation map that identifies reliable and erroneous regions, enabling fine-grained quality assessment. The MQE scales up video matting in two ways: (1) as an online matting-quality feedback during training to suppress erroneous regions, providing comprehensive supervision, and (2) as an offline selection module for data curation, improving annotation quality by combining the strengths of leading video and image matting models. This process allows us to build a large-scale real-world video matting dataset, VMReal, containing 28K clips and 2.4M frames. To handle large appearance variations in long videos, we introduce a reference-frame training strategy that incorporates long-range frames beyond the local window for effective training. Our MatAnyone 2 achieves state-of-the-art performance on both synthetic and real-world benchmarks, surpassing prior methods across all metrics.
Similar Papers
Generative Video Matting
CV and Pattern Recognition
Makes videos look real by separating people from backgrounds.
Post-Training Quantization for Video Matting
CV and Pattern Recognition
Makes video editing work faster on phones.
Towards Unified Video Quality Assessment
CV and Pattern Recognition
Tells you why videos look bad.