arXiv Analytics

Sign in

arXiv:1901.03791 [stat.CO]AbstractReferencesReviewsResources

Optimization of Survey Weights under a Large Number of Conflicting Constraints

Matthew R. Williams, Terrance D. Savitsky

Published 2019-01-12Version 1

In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to increase efficiency by incorporating auxiliary population information, and to reduce the variability in estimates due to extreme weights. It is often the case that no single set of weights can be found which successfully incorporates all of these modifications because together they induce a large number of constraints and restrictions on the feasible solution space. For example, a unique combination of categorical variables may not be present in the sample data, even if the corresponding population level information is available. Additional requirements for weights to fall within specified ranges may also lead to fewer population level adjustments being incorporated. We present a framework and accompanying computational methods to address this issue of constraint achievement or selection within a restricted space that will produce revised weights with reasonable properties. By combining concepts from generalized raking, ridge and lasso regression, benchmarking of small area estimates, augmentation of state-space equations, path algorithms, and data-cloning, this framework simultaneously selects constraints and provides diagnostics suggesting why a fully constrained solution is not possible. Combinatoric operations such as brute force evaluations of all possible combinations of constraints and restrictions are avoided. We demonstrate this framework by applying alternative methods to post-stratification for the National Survey on Drug Use and Health. We also discuss strategies for scaling up to even larger data sets. Computations were performed in R and code is available from the authors.

Related articles: Most relevant | Search more
arXiv:1903.05715 [stat.CO] (Published 2019-03-13)
HCmodelSets: An R package for specifying sets of well-fitting models in regression with a large number of potential explanatory variables
arXiv:2002.00413 [stat.CO] (Published 2020-02-02)
Fast Generating A Large Number of Gumbel-Max Variables
arXiv:1301.6282 [stat.CO] (Published 2013-01-26)
AABC: approximate approximate Bayesian computation when simulating a large number of data sets is computationally infeasible