Day45 ML Review - K-Nearest Neighbors (2)

August 2, 2024 3 minute read

Distance Metrics- Euclidean, Manhattan, Minkowski & Chebyshev Distance, and Cosine Similarity

Distance Metrics

From the previous posting, here’s how we can implement KNN for a classification task using Scikit-learn:

from sklearn.neighbors import KneighborsClassifier
knn = KmeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')

# Train the model (fit the model)
knn.fit(X_train_std, y_train)

# Predict on the test set
y_pred = knn.predict(X_test)

In the context of K-Nearest Neighbors (KNN), setting p=2 refers to choosing the Minkowski distance with a parameter $p=2$, which specifically corresponds to the Euclidean distance.

Let’s explore various distance metrics such as Minkowski, Euclidean, Manhattan, and Chebyshev distances. These metrics calculate the “distance” or “similarity” between two points in a space. The choice of distance metric can have a significant impact on the performance of machine learning algorithms like K-Nearest Neighbors (KNN).

1. Euclidean Distance

Euclidean distance is the most commonly used distance metric. It represents the straight-line distance between two points in an Euclidean space.

Formula: For two points $A = (x_1, x_2, \dots, x_n)$ and $B= (y_1, y_2, \dots, y_n)$ is an $n$- dimensional space, the Euclidean distance $D$ is given by:
$D(A,B) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$

Example: For points $A=(3,4)$ and $B=(7,1)$:
$D(A,B) = \sqrt{(7-3)^2+(1-4)^2} = \sqrt{16+9} = \sqrt{25} = 5$
Applications:
- K-Nearest Neighbors (KNN)
- Clustering (e.g., K-Means)
- Principal Component Analysis (PCA)

2. Manhattan Distance

Manhattan distance (also known as the L1 norm or taxicab distance) is the sum of the absolute differences of the coordinates. It measures how far apart two points are if we can only move along grid lines (like streets in a city).

Formula: For two points $A=(x_1,x_2, \dots, x_n)$ and $B=(y_1, y_2, \dots, y_n)$ is an $n-$dimensional space, the Manhattan distance $D$ is given by:
$D(A, B) = \sum_{i=1}^{n} |x_i - y_i|$
Example: For points $A=(3,4)$ and $B=(7,1)$:
$ D(A, B) = |7-3| + |1-4| = 4 + 3 = 7$
Applications:
- Urban planning (movement along grid-like paths)
- Robustness to outliers in high-dimensional spaces
- Sparse data where many coordinates are zero

3. Minkowski Distance

Minkowski distance is a generalization of both Euclidean and Manhattan distances. It includes these and other distances as special cases based on the parameter $p$.

$D(A, B) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{\frac{1}{p}}$

where:

$A = (x_1, x_2, \dots, x_n)$ and $B = (y_1, y_2, \dots, y_n)$ are two points in $n$-dimensional space.
$p$ is a parameter that determines the type of distance.
- $p=1$: Manhattan distance
- $p=2$ : Euclidean distance
- $p= \infty $: Chebyshev distance (maximum difference)
Example: For points $A=(3,4)$ and $B=(7,1)$:

$D(A, B) = \left( |7-3|^3 + |1-4|^3 \right)^{\frac{1}{3}} = \left( 4^3 + (-3)^3 \right)^{\frac{1}{3}} = \left( 64 + 27 \right)^{\frac{1}{3}} = \left( 91 \right)^{\frac{1}{3}} \approx 4.5$

Applications:
- Custom distance metrics
- Flexibility to adjust sensitivity to differences between points

4. Chebyshev Mdistance

Chebyshev distance (also known as the L$\infty$ norm or maximum metric) measures the greatest of the absolute differences between the coordinates of a pair of points. It represents the minimum number of moves required to reach one point from another in a grid where diagonal movement is allowed.

Formula: For two points $A = (x_1, x_2, \dots, x_n)$ and $B = (y_1, y_2, \dots, y_n) $ in an $n$-dimensional space, the Chebyshev distance $D$ is given by:
$ D(A, B) = \max_{i=1}^{n} |x_i - y_i|$
Example: For points $A = (3, 4)$ and $B = (7, 1)$:
$D(A, B) = \max(|7-3|, |1-4|) = \max(4, 3) = 4$
Applications:
- Chess (King’s move): The distance a king moves between two squares on a chessboard.
- Board games and grid-based algorithms: Where diagonal movement is possible.

5. Cosine Similarity

Cosine similarity is not a traditional distance metric, but it is commonly used to measure the cosine of the angle between two non-zero vectors in an inner product space. It is frequently utilized in high-dimensional spaces to assess the similarity between two vectors. The cosine distance can be derived as $1- $ cosine similarity.

Formula: For two vectors $A = (x_1, x_2, \dots, x_n)$ and $B = (y_1, y_2, \dots, y_n)$, the cosine similarity $S$ is given by:
$S(A, B) = \frac{\sum_{i=1}^{n} x_i y_i}{\sqrt{\sum_{i=1}^{n} x_i^2} \cdot \sqrt{\sum_{i=1}^{n} y_i^2}}$

Cosine Distance is then:
$\text{Cosine Distance} = 1 - S(A, B)$
Example: For vectors $A = (1, 2, 3) $ and $B = (4, 5, 6)$:
$S(A, B) = \frac{(1 \times 4) + (2 \times 5) + (3 \times 6)}{\sqrt{1^2 + 2^2 + 3^2} \cdot \sqrt{4^2 + 5^2 + 6^2}} = \frac{4 + 10 + 18}{\sqrt{14} \cdot \sqrt{77}} = \frac{32}{\sqrt{1078}} \approx 0.9746$

Cosine Distance is then:
$\text{Cosine Distance} = 1 - 0.9746 = 0.0254 $
Applications:
- Text mining and document similarity: Measuring similarity between documents based on word frequency vectors.
- Recommendation systems: Similarity between user preferences.
- High-dimensional data: Where Euclidean distance may be less meaningful due to the curse of dimensionality.

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day45 ML Review - K-Nearest Neighbors (2)

Distance Metrics- Euclidean, Manhattan, Minkowski & Chebyshev Distance, and Cosine Similarity

Distance Metrics

1. Euclidean Distance

2. Manhattan Distance

3. Minkowski Distance

4. Chebyshev Mdistance

5. Cosine Similarity

Share on

Leave a comment

You may also enjoy

Day208 - Leetcode: Python 121 & SQL 175,176 & DL Review

Day207 - Leetcode: Python 53 & SQL 185 & DL Review

Day206 - Leetcode: Python 217 & SQL 175,176 & DL Review

Day205 - Leetcode: Python 175 & SQL Inner Join & DL Review