Day12 ML Review - Gradient Descent (2)

May 31, 2024 2 minute read

Mathematical Explanation

from https://angeloyeo.github.io/2020/08/16/gradient_descent.html

Derivation of formula for gradient descent

At its core, the gradient descent method is a machine learning technique used to find the value of the independent variable that minimizes the function value. It does this by adjusting the variable’s value to decrease the function value. Think of it as a way to ‘descend’ to the lowest point of a function, much like a hiker navigating a mountainous terrain.

If the data is large, finding the solution through an iterative method such as gradient descent can be more computationally efficient.

Gradient descent involves using the function’s gradient to determine whether it reaches its minimum value when the value is adjusted.

If the slope is positive, the function value also increases as the value increases.
Conversely, if the slope is negative, the function value decreases as the value increases.

Also, a large slope value indicates a steep incline, but it also signifies being far from the coordinates, while the position corresponds to the minimum/maximum value.

If the function value increases as x increases at a specific point (the slope is positive), we need to move x in the negative direction. Conversely, if the function value at a specific point decreases as x increases (the slope is negative), we move x in the positive direction.

Reference in Korean

gradient descent는 함수의 기울기(즉, gradient)를 이용해 $x$의 값을 어디로 옮겼을 때 함수가 최소값을 찾는지 알아보는 방법이라고 할 수 있다.

기울기가 양수라는 것은 $x$ 값이 커질 수록 함수 값이 커진다는 것을 의미하고, 반대로 기울기가 음수라면 $x$값이 커질 수록 함수의 값이 작아진다는 것을 의미한다고 볼 수 있다.

또, 기울기의 값이 크다는 것은 가파르다는 것을 의미하기도 하지만, 또 한편으로는 $x$의 위치가 최소값/최댓값에 해당되는 $x$ 좌표로부터 멀리 떨어져있는 것을 의미하기도 한다.

Direction Component of the Gradient

In gradient descent, the direction of the update is crucial to effectively minimizing a function. If the function’s value increases with an increase in $x$ (i.e., the slope is positive), the optimal strategy is to move $x$ in the opposite direction to reduce the function’s value. Conversely, if the function’s value decreases as $x$ increases (i.e., the slope is negative), $x$ should be moved in the positive direction to further decrease the function value.

This concept is mathematically represented as:

$$ x_{i+1} = x_i - \alpha \times \nabla f(x_i) $$

where $x_i$ and $x_{i+1}$ represent the current and updated values of $x$, respectively, $\nabla f(x_i)$ denotes the gradient of the function at $x_i$, and $\alpha$ is the learning rate or step size.

Step Size in Gradient Descent

The step size, $\alpha$, controls how much we adjust $x$ at each iteration of gradient descent. It’s essential to set $\alpha$ appropriately to ensure efficient and effective convergence to the minimum. If $\alpha$ is too large, the algorithm might overshoot the minimum; if it’s too small, convergence could be unnecessarily slow.

The magnitude of the gradient often informs the choice of step size. A larger gradient magnitude indicates that $x$ is far from the minimum, suggesting a potentially larger step. Conversely, a smaller gradient suggests that $x$ is closer to the minimum, and a smaller step might be preferable.

In practice, $\alpha$ can be kept constant or adjusted dynamically based on criteria such as the iteration count or changes in the function value. Advanced versions of gradient descent, such as Adam or RMSprop, incorporate mechanisms to adaptively adjust $\alpha$ based on past gradients.

This method of dynamically adjusting the step size helps in moving significantly when far from the minimum and making finer adjustments as one approaches the target, thus enhancing the convergence properties of gradient descent.

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day12 ML Review - Gradient Descent (2)

Mathematical Explanation

Derivation of formula for gradient descent

Reference in Korean

Direction Component of the Gradient

Step Size in Gradient Descent

Share on

Leave a comment

You may also enjoy

Day175 - MLOps Review: Data Distribution Shifts And Monitoring (2)

Day174 - MLOps Review: Data Distribution Shifts And Monitoring (1)

Day173 - MLOps Review: Model Deployment And Prediction Service (3)

Day172 - MLOps Review: Model Deployment and Prediction Service (2)