Nonparametric Kernel Density Estimation Using Auxiliary Information from Complex Survey Data
Mostafa, Sayed A.
MetadataShow full item record
This dissertation presents some new and serious attempts towards using auxiliary information effectively in kernel density estimation from complex survey data. Two approaches are proposed to develop new kernel density estimators that use both complete auxiliary information and sample information in the framework of complex surveys. Both approaches involve two steps: the first is a modeling step while the second uses the sample data and model fits from the modeling step to build efficient kernel density estimators for the density function, $f$, of the study variable $Y$. The main distinction between the two approaches is in the modeling step where in the first approach we directly model the relationship between the study variable $Y$ and the auxiliary variable $X$ using both parametric and nonparametric regression models while in the second approach we use nonparametric regression models to describe the relationship between a kernel-transformed study variable, say $Z$, and the auxiliary variable $X$. The first approach results in two model-assisted kernel density estimators for $f$. A third model-assisted kernel density estimator for $f$ comes from the second approach. The three new estimators use the sampling weights to account for unequal probability sampling designs. The statistical properties of each of these estimators are studied under a combined design-model-based inference framework which accounts for both the underling model and the sampling scheme. The global error criterion, mean integrated squared error, is used to determine the optimal smoothing parameter for each of the three new estimators. Direct plug-in techniques are then used to obtain data-driven bandwidth estimators for these smoothing parameters. Using Monte Carlo simulation methods, we address the finite sample properties of the proposed estimators under different finite populations and sampling plans. Additionally, the performance of the new estimators relative to standard estimators that ignore the auxiliary information is assessed. On a somewhat independent track, the problem of estimating density and regression functions from samples of random sizes is considered. This problem is studied under the case of sampling with-replacement from finite populations. In this case, the effective sample size, i.e., the number of distinct sample units, is random. Based on the set of distinct sample units, kernel estimators for both density and regression functions are introduced and their statistical properties are investigated.
- OSU Dissertations