Handling the curse of dimensionality in multivariate kernel density estimation
Abstract
Kernel density estimation (KDE) is the most widely-used practical method for accurate nonparametric density estimation. Many works had been done on both the univariate and multivariate cases showing the efficacy, practicality and applicability of this method. Despite the fact that multivariate kernel density estimation is an important technique in multivariate data analysis and has a wide range of applications, its performance worsens exponentially with high dimensional data sets, this phenomenon is called �curse of dimensionality�, where there is exponential growth in combinatorial optimization as the dimension of the data set increases. Scott and Wand (1991) demonstrated a progressive deterioration of the multivariate kernel density estimation as the dimension p increases by showing that an increase in sample size is required to attain an equivalent amount of accuracy. This work proposes a new multivariate kernel density estimation approach which is based on the sample means. The method has the characteristic that it works for self-revolving densities or the ellipsoidally symmetric distributions. It also works for spherical distributions since they can be transformed to ellipsoidally symmetric distributions by undergoing an affine transformation. The univariate normal, multivariate normal and the Cauchy distributions, just to mention a few, are some of the distributions that possess this self-revolving or the ellipsoidally symmetric property. In addition, this work also proposes another new multivariate kernel density estimate which handles the curse of dimensionality better. We applied this new method to the probability density function, the distribution function and nonparametric multivariate regression. In all these cases, our multivariate kernel density estimation approach which is based on the sample means performs better than the regular multivariate kernel density estimation based on the sample data. We also observed that the proposed multivariate kernel density method breaks the �curse of dimensionality� and remedy the deficiency of high dimensional bandwidth selection. Besides, its performance is consistent in most of the bandwidth selection methodologies. The second proposed new multivariate density estimate does not completely breaks the curse of dimensionality but the effect of the curse on it is minimal as compared to the regular multivariate kernel density estimate.
Collections
- OSU Dissertations [11222]