Unified Distributed Estimation Framework for Sufficient Dimension Reduction Based on Conditional Moments
By: Hongying Li , Minyi Zhu , Yaqi Cao and more
Potential Business Impact:
Lets computers learn from data spread everywhere.
Nowadays, massive datasets are typically dispersed across multiple locations, encountering dual challenges of high dimensionality and huge sample size. Therefore, it is necessary to explore sufficient dimension reduction (SDR) methods for distributed data. In this paper, we first propose an exact distributed estimation of sliced inverse regression, which substantially improves computational efficiency while obtaining identical estimation as that on the full sample. Then, we propose a unified distributed framework for general conditional-moment-based inverse regression methods. This framework allows for distinct population structure for data distributed at different locations, thus addressing the issue of heterogeneity. To assess the effectiveness of our proposed methods, we conduct simulations incorporating various data generation mechanisms, and examine scenarios where samples are homogeneous equally, heterogeneous equally, and heterogeneous unequally scattered across local nodes. Our findings highlight the versatility and applicability of the unified framework. Meanwhile, the communication cost is practically acceptable and the computation cost is greatly reduced. Sensitivity analysis verifies the robustness of the algorithm under extreme conditions where the SDR method locally fails on some nodes. A real data analysis also demonstrates the superior performance of the algorithm.
Similar Papers
Distributional Random Forests for Complex Survey Designs on Reproducing Kernel Hilbert Spaces
Methodology
Helps understand health data from surveys better.
Efficient forward and inverse uncertainty quantification for dynamical systems based on dimension reduction and Kriging surrogate modeling in functional space
Dynamical Systems
Makes complex computer models work better with less data.
Subspace Ordering for Maximum Response Preservation in Sufficient Dimension Reduction
Methodology
Finds better ways to understand data for predictions.