Robustness and accuracy of mean opinion scores with hard and soft outlier detection
By: Dietmar Saupe, Tim Bleile
Potential Business Impact:
Find bad ratings to make picture quality fair.
In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.
Similar Papers
A method for outlier detection based on cluster analysis and visual expert criteria
Machine Learning (CS)
Finds weird data points hidden in big groups.
DOD: Detection of outliers in high dimensional data with distance of distances
Methodology
Finds strange data points in complex information.
Multi-Method Ensemble for Out-of-Distribution Detection
CV and Pattern Recognition
Helps computers spot fake or wrong information.