Day32 ML Review - Support Vector Machine (1)

July 23, 2024 2 minute read

Basic Concepts and Mathematical Formulations

Another powerful and widely used learning algorithm is the support vector machine (SVM), which can be considered an extension of the perceptron.

Basic Concepts

Using a support vector machine, we aim to maximize the margin, the distance between the separating hyperplane (decision boundary) and the training examples closest to this hyperplane, known as support vectors.

1. Hyperplane and decision boundary

In a binary classification problem, SVM aims to find the optimal hyperplane that separates the data points of different classes.
A hyperplane in an $n$-dimensional space is a flat affine subspace of dimension $n-1$. (For a 2D space, a hyperplane is a line; for a 3D space, it is a plane. )

2. Support Vectors

Support vectors are the data points closest to the hyperplane and influence its position and orientation; they lie on the edge of the margin.
The optimal hyperplane maximizes the margin, which is the distance between it and the nearest support vectors from both classes.

Mathematical Formulation - Maximum Margin Intuition

The rationale for having decision boundaries with large margins is that they are likely to have lower generalization errors, while models with small margins are more prone to overfitting.

When we see those positive and negative hyperplanes that are parallel to the decision boundary, which can be expressed as follows:

$$ \begin{align} w_0 + \mathbf{w}^T \mathbf{x}_{\text{pos}} &= 1 \quad (1) \\ w_0 + \mathbf{w}^T \mathbf{x}_{\text{neg}} &= -1 \quad (2) \end{align} $$

If we subtract those two linear equations (1) and (2) from each other, we get:

$$ \Rightarrow \mathbf{w}^T (\mathbf{x}_{\text{pos}} - \mathbf{x}_{\text{neg}}) = 2 $$

We can normalize this equation by the length of the vector $w$, which is defined as follows:

$$ \|\mathbf{w}\| = \sqrt{\sum_{j=1}^m w_j^2} $$

So, we delivered the following question:

$$ \frac{\mathbf{w}^T (\mathbf{x}_{\text{pos}} - \mathbf{x}_{\text{neg}})}{\|\mathbf{w}\|} = \frac{2}{\|\mathbf{w}\|} $$

The left side of the preceding equation represents the distance between the positive and negative hyperplane, known as the margin, which we aim to maximize.

Now, the objective function of the SVM becomes the maximization of this margin by maximizing $\frac{2}{\Vert \mathbf{w} \rVert}$ under the constraint that the examples are classified correctly, which can be written as:

$$ \begin{cases} w_0 + \mathbf{w}^T \mathbf{x}^{(i)} \geq 1 & \text{if } y^{(i)} = 1 \\ w_0 + \mathbf{w}^T \mathbf{x}^{(i)} \leq -1 & \text{if } y^{(i)} = -1 \end{cases} $$

$$ \quad \text{for } i = 1, \ldots, N $$

$N$ is the number of examples in our dataset.

In practice, it is easier to minimize the reciprocal term, $\frac{1}{2}\Vert W \Vert^2$.

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day32 ML Review - Support Vector Machine (1)

Basic Concepts and Mathematical Formulations

Basic Concepts

1. Hyperplane and decision boundary

2. Support Vectors

Mathematical Formulation - Maximum Margin Intuition

Share on

Leave a comment

You may also enjoy

Day175 - MLOps Review: Data Distribution Shifts And Monitoring (2)

Day174 - MLOps Review: Data Distribution Shifts And Monitoring (1)

Day173 - MLOps Review: Model Deployment And Prediction Service (3)

Day172 - MLOps Review: Model Deployment and Prediction Service (2)