Semiparametric Robust Estimation of Population Location
By: Ananyabrata Barua, Ayanendranath Basu
Potential Business Impact:
Finds the real signal hidden in noisy data.
Real-world measurements often comprise a dominant signal contaminated by a noisy background. Robustly estimating the dominant signal in practice has been a fundamental statistical problem. Classically, mixture models have been used to cluster the heterogeneous population into homogeneous components. Modeling such data with fully parametric models risks bias under misspecification, while fully nonparametric approaches can dissipate power and computational resources. We propose a middle path: a semiparametric method that models only the dominant component parametrically and leaves the background completely nonparametric, yet remains computationally scalable and statistically robust. So instead of outlier downweighting, traditionally done in robust statistics literature, we maximize the observed likelihood such that the noisy background is absorbed by the nonparametric component. Computationally, we propose a new approximate FFT-accelerated likelihood maximization algorithm. Empirically, this FFT plug-in achieves order-of-magnitude speedups over vanilla weighted EM while preserving statistical accuracy and large sample properties.
Similar Papers
Semiparametric Robust Estimation of Population Location
Computation
Cleans up messy signals to find important information.
Density estimation for compositional data using nonparametric mixtures
Methodology
Helps computers understand data with zero values.
A Simple and Robust Multi-Fidelity Data Fusion Method for Effective Modeling of Citizen-Science Air Pollution Data
Methodology
Improves air pollution maps using many sensors.