Day31 ML Review - Logistic Regression (3)

July 22, 2024 2 minute read

How To Train in Scikit-Learn, and Regularization with LR model

(Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition)

Training a Logistic Regression Model with Scikit-learn

Let’s look at how to use Scikit-learn’s more optimized implementation of logistic regression, which also supports multiclass settings off the shelf. We will use the sklearn.linear_model.LogisticRegression class and the familiar fit method to train the model on all three classes in the standardized flower training dataset.

Also, we will set multi_class='ovr' for illustration purposes, noting that ‘multinomial’ is usually recommended for mutually exclusive classes.

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(C=100.0, random_state=1, solver='lgfgs', multi_class='ovr')
lr.fit(X_train_std, y_train)
lr.predict(X_test_std)

Note that many different optimization algorithms exist to solve optimization problems. To minimize convex loss functions, such as logistic regression loss, it is recommended that more advanced approaches be used than regular stochastic gradient descent (SGD).

Tackling Overfitting Via Regularization

Overfitting is a common problem in machine learning. In this problem, a model performs well on training data but does not generalize to unseen data (test data). We can also say that the model has a high variance, which can be caused by having too many parameters, leading to a model that is too complex given the underlying data.

If the model has an underfitting problem (high bias), it means that our model is not complex enough to capture the pattern in the training data well and, therefore, also suffers from low performance on unseen data.

The parameter, C, implemented for the LogisticRegression class in scikit-learn comes from a convention in support vector machines. It is directly related to the regularization parameter, $\lambda$, which is its inverse.

Decreasing the value of the inverse regularization parameter, C, increases the regularization strength.

Let’s dive deeper into the regularization in Logistic Regression here.

The bias-variance tradeoff

In general, we can say that “High variance” correlates with overfitting, while “High Bias” correlates with underfitting. Variance in machine learning models measures the consistency or variability of the model’s prediction when classifying a particular example after training the model multiple times on different subsets of the training dataset. It shows how sensitive the model is to randomness in the training data. On the other hand, bias measures how far off the predictions are from the correct values in general after rebuilding the model multiple times on different training datasets; bias is the measure of the systematic error that is not due to randomness.

One way of finding a good bias-variance tradeoff is to tune the complexity of the model via regularization.

Regularization is a useful method for managing collinearity (high correlation among features), filtering out noise from data, and preventing overfitting. It introduces additional information (bias) to penalize extreme parameter (weight) values.

(Side note: Regularization is another essential reason for feature scaling, such as standardization, to ensure that all our features are on comparable scales for regularization to work properly.)

Via the regularization parameter, $\lambda$, we can then control how well we fit the training data while keeping the weights small. By increasing the value of $\lambda$, we increase the regularization strength. The most common form of regularization is L2 regularization, which can be written as follows:

$$ \frac{\lambda}{2} \|\mathbf{w}\|^2 = \frac{\lambda}{2} \sum_{j=1}^{m} w_j^2 $$

The cost function for logistic regression can be regularized by adding a simple regularization term, which will shrink the weights during model training.

$$ J(\mathbf{w}) = \sum_{i=1}^{n} \left[ -y^{(i)} \log(\phi(z^{(i)})) - (1 - y^{(i)}) \log(1 - \phi(z^{(i)})) \right] + \frac{\lambda}{2} \|\mathbf{w}\|^2 $$

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day31 ML Review - Logistic Regression (3)

How To Train in Scikit-Learn, and Regularization with LR model

Training a Logistic Regression Model with Scikit-learn

Tackling Overfitting Via Regularization

The bias-variance tradeoff

Share on

Leave a comment

You may also enjoy

Day175 - MLOps Review: Data Distribution Shifts And Monitoring (2)

Day174 - MLOps Review: Data Distribution Shifts And Monitoring (1)

Day173 - MLOps Review: Model Deployment And Prediction Service (3)

Day172 - MLOps Review: Model Deployment and Prediction Service (2)