| Contents |
k-Nearest Neighbors (k-NN) Prediction
|
Introduction In k-nearest-neighbor prediction, the training data set is used to predict the value of a variable of interest for each member of a "target" data set. The structure of the data is that there is a variable of interest ("amount purchased," for example), and a number of additional predictor variables (age, income, location...). Generally speaking, the algorithm is as follows:
Of course the computing time goes up as k goes up, but the advantage is that higher values of k provide smoothing that reduces vulnerability to noise in the training data. In practical applications, typically, k is in units or tens rather than in hundreds or thousands. See Also:
|