Score: 0

Goodness-of-fit testing of the distribution of posterior classification probabilities for validating model-based clustering

Published: November 6, 2025 | arXiv ID: 2511.04206v1

By: Salima El Kolei, Matthieu Marbac

Potential Business Impact:

Checks if computer groups data correctly.

Business Areas:

A/B Testing Data and Analytics

We present the first method for assessing the relevance of a model-based clustering result in both parametric and non-parametric frameworks. The method directly aligns with the clustering objective by assessing how well the conditional probabilities of cluster memberships, as defined by the mixture model, fit the data. By focusing on these conditional probabilities, the procedure applies to any type and dimension of data and any mixture model. The testing procedure requires only a consistent estimator of the parameters and the associated conditional probabilities of classification for each observation. Its implementation is straightforward, as no additional estimator is needed. Under the null hypothesis, the method relies on the fact that any functional transformation of the posterior probabilities of classification has the same expectation under both the model being tested and the true model. This goodness-of-fit procedure is based on a empirical likelihood method with an increasing number of moment conditions to asymptotically detect any alternative. Data are split into blocks to account for the use of a parameter estimator, and the empirical log-likelihood ratio is computed for each block. By analyzing the deviation of the maximum empirical log-likelihood ratios, the exact asymptotic significance level of the goodnessof-fit procedure is obtained.

Conditioning on posterior samples for flexible frequentist goodness-of-fit testing

Methodology

Tests data better when other methods fail.

7 Nov 2025 1

89%

Conditioning on posterior samples for flexible frequentist goodness-of-fit testing

Methodology

Lets scientists test ideas with more computer power.

7 Nov 2025 1

89%

Non-Parametric Goodness-of-Fit Tests Using Tsallis Entropy Measures

Methodology

Finds patterns in messy data better.

17 Jun 2025 0

View PDF Login to Bookmark

Page Count

30 pages

Goodness-of-fit testing of the distribution of posterior classification probabilities for validating model-based clustering

Checks if computer groups data correctly.

Technical Abstract

Conditioning on posterior samples for flexible frequentist goodness-of-fit testing

Conditioning on posterior samples for flexible frequentist goodness-of-fit testing

Non-Parametric Goodness-of-Fit Tests Using Tsallis Entropy Measures