Day44 ML Review - K-Nearest Neighbors (1)
Basic Concepts, How It Works, and Parametric & Non-Parametric Model
The K-Nearest Neighbor (KNN) classifier is especially interesting because it fundamentally differs from the learning algorithms we’ve discussed. KNN is a typical example of a “lazy learner.” It is called “lazy” not because of its apparent simplicity but because it doesn’t learn a discriminative function from the training data; instead, it memorizes the training dataset.
Key Concepts
- Instance-Based Learning: KNN is an instance-based learning algorithm that memorizes the training dataset and makes predictions by comparing new data points directly to the stored instances.
- Non-Parametric: KNN does not make any assumptions about the underlying data distribution. This makes it flexible but potentially sensitive to irrelevant features and noise in the data.
- Lazy Learning: KNN is called a lazy learner because it doesn’t learn an explicit model during the training phase. Lazy Learning is a machine learning algorithm that delays the generalization or model-building process until a query is made. When we train a KNN model, it doesn’t create a generalized model based on the data. Instead, it simply stores the training data. All the computation happens in the prediction phase.
How KNN Works
-
Choosing K: The algorithm first selects the number of neighbors $K$ (a positive integer). The value of $K$ is a critical hyperparameter that controls the model’s balance between bias and variance.
-
Distance Metric: KNN typically uses a distance metric (e.g., Euclidean distance) to measure the similarity between instances. The distance between two points $A = (x_1, y_1)$ and $B = (x_2, y_2)$ in a 2D space using Euclidean distance is calculated as:
$\text{Distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$
Other metrics, such as Manhattan distance, Minkowski distance, or Hamming distance, can also be used depending on the problem.
-
Finding Neighbors: For a given test instance, the algorithm computes the distance to all training data points and identifies the $K$ closest instances (neighbors).
-
Prediction:
- Classification: The most common class among the $K$ nearest neighbors are assigned as the predicted class for the test instance. This is typically done using a majority vote.
- Regression: The predicted value is the average (or sometimes the weighted average) of the values of the $K$ nearest neighbors.
Advantages of KNN
- Simplicity: KNN is easy to understand and implement.
- No Training Phase: Since KNN is a lazy learner, it doesn’t involve a training phase, making it quick to set up.
- Flexibility: This can be used for both classification and regression tasks.
Disadvantages of KNN
- Computational Complexity: KNN can be computationally expensive, especially with large datasets, since it requires calculating the distance between the test point and all training points for each prediction.
- Memory Intensive: KNN stores all the training data, which can be memory-intensive for large datasets.
- Sensitivity to Irrelevant Features: KNN can be significantly affected by irrelevant or redundant features, which can distort distance measurements.
- Choice of $K$: The choice of $K$ is crucial. A small $K$ can lead to overfitting (high variance), while a large $K $can lead to underfitting (high bias).
Example of KNN in Scikit-learn
Here’s how we can implement KNN for a classification task using Scikit-learn:
from sklearn.neighbors import KneighborsClassifier
knn = KmeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')
# Train the model (fit the model)
knn.fit(X_train_std, y_train)
# Predict on the test set
y_pred = knn.predict(X_test)
In the context of K-Nearest Neighbors (KNN), setting p=2 refers to choosing the Minkowski distance with a parameter ( p = 2 ), which precisely corresponds to the Euclidean distance.
Side note
In machine learning, algorithms can be broadly classified into parametric and nonparametric models based on their approaches to learning and prediction.
| Feature | Parametric Models | Non-Parametric Models |
|---|---|---|
| Number of Parameters | Fixed, determined before training | Flexible, can grow with the size of the data |
| Assumptions | Strong assumptions about the data distribution | Few or no assumptions about the data distribution |
| Flexibility | Less flexible may underfit complex data | More flexible, can capture complex patterns |
| Data Requirement | Requires less data | Requires more data to avoid overfitting |
| Computation | Generally faster and more efficient | More computationally intensive |
| Risk of Overfitting | is Lower, but may underfit | Higher, but can be controlled with regularization techniques |
Choosing Between Parametric and Non-Parametric Models
- Parametric Models are typically chosen when you have substantial prior knowledge about the form of the data distribution or when computational efficiency and simplicity are essential.
- Non-parametric models are preferred when flexibility is crucial, and you want the model to learn complex patterns from the data without imposing rigid assumptions.
Leave a comment