Accounting for missing data when modelling block maxima
By: Emma S. Simpson, Paul J. Northrop
Modelling block maxima using the generalised extreme value (GEV) distribution is a classical and widely used method for studying univariate extremes. It allows for theoretically motivated estimation of return levels, including extrapolation beyond the range of observed data. A frequently overlooked challenge in applying this methodology comes from handling datasets containing missing values. In this case, one cannot be sure whether the true maximum has been recorded in each block, and simply ignoring the issue can lead to biased parameter estimators and, crucially, underestimated return levels. We propose an extension of the standard block maxima approach to overcome such missing data issues. This is achieved by explicitly accounting for the proportion of missing values in each block within the GEV model. Inference is carried out using likelihood-based techniques, and we propose an update to commonly used diagnostic plots to assess model fit. We assess the performance of our method via a simulation study, with results that are competitive with the "ideal" case of having no missing values. The practical use of our methodology is demonstrated on sea surge data from Brest, France, and air pollution data from Plymouth, U.K.
Similar Papers
Estimating Extreme Wave Surges in the Presence of Missing Data
Methodology
Fixes wave data errors for better storm predictions.
Bayesian Mixture Models for Heterogeneous Extremes
Methodology
Better predicts rare, dangerous events.
Weighted Parameter Estimators of the Generalized Extreme Value Distribution in the Presence of Missing Observations
Methodology
Fixes broken data for better flood predictions.