Dimension Agnostic Testing of Survey Data Credibility through the Lens of Regression
By: Debabrota Basu , Sourav Chakraborty , Debarshi Chanda and more
Potential Business Impact:
Checks if survey data truly reflects people.
Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the conclusions drawn from the data may remain consistent across different underlying distributions. In this context, we propose a task-based approach to assess the credibility of sampled surveys. Specifically, we introduce a model-specific distance metric to quantify this notion of credibility. We also design an algorithm to verify the credibility of survey data in the context of regression models. Notably, the sample complexity of our algorithm is independent of the data dimension. This efficiency stems from the fact that the algorithm focuses on verifying the credibility of the survey data rather than reconstructing the underlying regression model. Furthermore, we show that if one attempts to verify credibility by reconstructing the regression model, the sample complexity scales linearly with the dimensionality of the data. We prove the theoretical correctness of our algorithm and numerically demonstrate our algorithm's performance.
Similar Papers
Data Reliability Scoring
Machine Learning (CS)
Measures data quality without knowing the real answers.
A Comprehensive Evaluation of the Sensitivity of Density-Ratio Estimation Based Fairness Measurement in Regression
Machine Learning (CS)
Fixes unfair computer decisions in predictions.
Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models
Machine Learning (CS)
Makes AI predictions more trustworthy and reliable.