Statistical Inference of the Matthews Correlation Coefficient for Multiclass Classification
By: Jun Tamura , Yuki Itaya , Kenichi Hayashi and more
Potential Business Impact:
Makes judging computer predictions more accurate.
Classification problems are essential statistical tasks that form the foundation of decision-making across various fields, including patient prognosis and treatment strategies for critical conditions. Consequently, evaluating the performance of classification models is of significant importance, and numerous evaluation metrics have been proposed. Among these, the Matthews correlation coefficient (MCC), also known as the phi coefficient, is widely recognized as a reliable metric that provides balanced measurements even in the presence of class imbalance. However, with the increasing prevalence of multiclass classification problems involving three or more classes, macro-averaged and micro-averaged extensions of MCC have been employed, despite a lack of clear definitions or established references for these extensions. In the present study, we propose a formal framework for MCC tailored to multiclass classification problems using macro-averaged and micro-averaged approaches. Moreover, discussions on the use of these extended MCCs for multiclass problems often rely solely on point estimates, potentially overlooking the statistical significance and reliability of the results. To address this gap, we introduce several methods for constructing asymptotic confidence intervals for the proposed metrics. Furthermore, we extend these methods to include the construction of asymptotic confidence intervals for differences in the proposed metrics, specifically for paired study designs. The utility of our methods is evaluated through comprehensive simulations and real-world data analyses.
Similar Papers
A New Multiple Correlation Coefficient without Specifying the Dependent Variable
Methodology
Finds how groups of things relate without picking a main one.
Fiducial Confidence Intervals for Agreement Measures Among Raters Under a Generalized Linear Mixed Effects Model
Methodology
Measures how well different people agree.
Two new approaches to multiple canonical correlation analysis for repeated measures data
Methodology
Finds hidden connections in complex, changing data.