Variable selection in discriminant analysis based on the location model for mixed variables

Author: Mahat Nor  

Publisher: Springer Publishing Company

ISSN: 1862-5347

Source: Advances in Data Analysis and Classification, Vol.1, Iss.2, 2007-08, pp. : 105-122

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Non-parametric smoothing of the location model is a potential basis for discriminating between groups of objects using mixtures of continuous and categorical variables simultaneously. However, it may lead to unreliable estimates of parameters when too many variables are involved. This paper proposes a method for performing variable selection on the basis of distance between groups as measured by smoothed Kullback-Leibler divergence. Searching strategies using forward, backward and stepwise selections are outlined, and corresponding stopping rules derived from asymptotic distributional results are proposed. Results from a Monte Carlo study demonstrate the feasibility of the method. Examples on real data show that the method is generally competitive with, and sometimes is better than, other existing classification methods.