Score: 0

Sleep pattern profiling using a finite mixture of contaminated multivariate skew-normal distributions on incomplete data

Published: December 13, 2025 | arXiv ID: 2512.12464v1

By: Jason Pillay , Cristina Tortora , Antonio Punzo and more

Medical data often exhibit characteristics that make cluster analysis particularly challenging, such as missing values, outliers, and cluster features like skewness. Typically, such data would need to be preprocessed -- by cleaning outliers and missing values -- before clustering could be performed. However, these preliminary steps rely on objective functions different from those used in the clustering stage. In this paper, we propose a unified model-based clustering approach that simultaneously handles atypical observations, missing values, and cluster-wise skewness within a single framework. Each cluster is modelled using a contaminated multivariate skew-normal distribution -- a convenient two-component mixture of multivariate skew-normal densities -- in which one component represents the main data (the "bulk") and the other captures potential outliers. From an inferential perspective, we implement and use a variant of the EM algorithm to obtain the maximum likelihood estimates of the model parameters. Simulation studies demonstrate that the proposed model outperforms existing approaches in both clustering accuracy and outlier detection, across low- and high-dimensional settings, even in the presence of substantial missingness. The method is further applied to the Cleveland Children's Sleep and Health Study (CCSHS), a dataset characterised by incomplete observations. Without any preprocessing, the proposed approach identifies five distinct groups of sleepers, revealing meaningful differences in sleeper typologies.

Category
Statistics:
Methodology