Day08 ML Review - Linear Regression (2)
Cost Function and MSE Mean Squared Error
Every day is a new day. Spring has just bloomed in upstate New York.
As I sit here in the Science Library on campus, I can’t help but feel a sense of accomplishment. The summer semester has just started, and while others are enjoying their time, I’m here, learning and growing. It’s so productive and fills me with joy and wisdom. This journey of learning and growth is a testament to the power of perseverance and dedication. I genuinely appreciate my Lord Jesus with all my heart.
Cost Function
Overview
Imagine trying to fit a line to a scatter plot of data points. How do you know if your line is a good fit? This is where the cost function (loss or objective functions) comes in. It measures the difference between the observed values and the values predicted by the model, helping you determine how well the line fits the data. The choice of a cost function depends on the type of model being trained and the specific objectives of the learning algorithm.
Purpose of Cost Functions
- Model Evaluation: Cost functions measure the discrepancy between the model’s predicted outputs and the actual target values in the dataset. This measure helps determine the model’s accuracy and efficacy.
- Model Optimization: By defining a cost function, we establish a criterion that can be minimized (or maximized, sometimes) during training. This optimization drives the learning algorithm to adjust model parameters (like weights in neural networks) towards values that reduce the cost, thus improving the model’s prediction.
- Guide Learning: Different cost functions can prioritize different aspects of the model’s behavior, such as penalizing more complex models to avoid overfitting or being more sensitive to outliers in the data.
Common Cost Functions
Several standard cost functions are used across various types of machine learning models:
- Mean Squared Error (MSE): Used predominantly in regression tasks, MSE calculates the average of the squares of the differences between predicted and actual values. As it squares the error terms, it is sensitive to outliers.
- Cross-Entropy Loss: Widely used in classification tasks, particularly in logistic regression and neural networks for binary or multi-class classification. This loss function measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.
- Hinge Loss: This technique is often used for “maximum-margin” classification, most notably with Support Vector Machines (SVMs). It is used for binary classification tasks and aims to ensure that the correct class is not only correctly predicted but also separated by a clear margin of confidence from the incorrect class predictions.
Optimization Methods:
It will be dived deep into later.
To minimize or optimize these cost functions, various methods can be employed:
-
Gradient Descent: The most common optimization technique, especially useful for large datasets. It iteratively adjusts parameters, reducing the cost function until it reaches a minimum.
-
Stochastic Gradient Descent (SGD): A variant of gradient descent that updates the model’s parameters using only a single sample or a small batch of samples. This can lead to faster convergence times on large datasets.
Cost Function of Linear Regression
Linear regression aims to find the best-fitting line through data points. This line is defined by its parameters (coefficients): the slope & the intercept. The cost function helps determine how well the line fits the data by measuring the difference between the observed values and the values predicted by the model.
The cost function is critical in evaluating the model’s performance in linear regression. Linear regression aims to find the best-fitting line through data points. This line is defined by its parameters (coefficients): the slope and the intercept. The cost function helps determine how well the line fits the data by measuring the difference between the observed values and the values predicted by the model. The most commonly used cost function in linear regression is the Mean Squared Error (MSE).
Mean Squared Error (MSE)
(https://en.wikipedia.org/wiki/Mean_squared_error)
The MSE is calculated by taking the average of the squared differences between the observed values (actual y-values of the data points) and the predicted values (predicted y-values obtained from the regression line). Mathematically, it can be expressed as:
Where :
- $n$ is the number of data points,
- $y_i$ is the actual value at $i^{th}$ point,
- $\hat{y_i}$ is the predicted value at $i^{th}$ point.
Purpose of the MSE
-
Performance Measurement: MSE quantifies how far the predictions deviate from the actual outcomes, directly reflecting the model’s performance.
-
Model Fitting: Linear regression aims to minimize the MSE to achieve the best fit. The model can find the line that best fits the data by adjusting the coefficients (slope and intercept) to reduce MSE.
-
Gradient Descent: MSE is particularly useful because it is differentiable, allowing techniques like gradient descent to be used for optimization. Gradient descent is an algorithm that iteratively adjusts the parameter to minimize the MSE.
-
Characteristics of MSE
-
Sensitive to Outliers: MSE increases significantly when there are outliers in the data, as it squares the differences. This sensitivity can be beneficial for detecting outliers but can also skew the model if outliers are not addressed.
- Always Non-negative: MSE is always non-negative because it involves the squares of differences, and the square of any real number is non-negative. The best value (minimum) is 0 when all predictions perfectly match the actual values.
Leave a comment