## 前言

### 训练集

#### kNN

kNN的全称是K-nearest neighbors，即k最近邻，这种算法类似于微区间平滑处理，但是kNN更适合于多维数据。总的来说，只要给定我们想要估计的任意点 $x$ ，我们会寻找k个最近的点，然后取这些点的平均值。这就会估计 $f(x_1,x_2)$ , 跟微区间平滑处理生成一个条曲线类似。我们现在通过 $k$ 来控制灵活性。这时我们比较一下 $k=1$$k=100$ 时的计算结果：

## 交叉验证

The folds are returned as a list of numeric indices. The first fold of data is therefore:

Now we have some misclassifications. How well do we do for the rest of the folds?

However, in a real machine learning application, this may result in an underestimation of test set error for small sample sizes, where dimension reduction using the unlabeled full dataset gives a boost in performance. A safer choice would have been to transform the data separately for each fold, by calculating a rotation and dimension reduction using the training set only and applying this to the test set.

P407