Day133 - STAT Review: Classification (3)
Practical Statistics for Data Scientists: Logistic Regression (1) (Mathematic Foundation: Odds, Logit Function, Formula, and Examples)
Logistic Regression
Logistic Regression is a widely used classification algorithm that predicts a binary outcome (e.g., 0 or 1, default or non-default, spam or not spam). It is similar to linear regression, but instead of predicting continuous values, it predicts probabilities using the logistic function (sigmoid function).
Logistic regression is comparable to multiple linear regression but with a binary outcome. Transformations allow the fitting of a linear model. Unlike K-Nearest Neighbor and naive Bayes, it uses a structured modeling approach like discriminant analysis. Its speed and efficiency for scoring new data enhance its popularity.
Key Terms for Logistic Regression
- Logit
- The function that maps class membership probability to a range from $\pm\ \infty$ (instead of 0 to 1).
- = Log odds
- Odds
- The ratio of “success” (1) to “not success” (0).
- Log Odds
- The response from the linearized model is converted into a probability.
- The response from the linearized model is converted into a probability.
Logistic Response Function and Logit
Instead of modeling probability ($p$) directly, we model log-odds. The log-odds scale is linear, making it easier to work with. The model is interpretable: ($\beta_1$) represents the change in log-odds for a one-unit increase.
Why Do We Need a Transformation?
In linear regression, we assume:
However, this does not guarantee that $p$ will stay between 0 and 1, which of modeling $p$ necessary since probabilities must always be in this range. Therefore, instead of modeling $p$ directly, we transform it using the logistic (sigmoid) function.
The Logistic Response Function
The logistic response function (inverse logit function**) transforms a linear combination of predictors into a probability.
We utilize the logistic function to keep predicted probabilities within the (0,1) range, allowing a smooth transition between class labels. This approach proves effective for classification problems.
Interpreting the Function
- When ($\beta_0 + \beta_1 x_1 + \dots$) is large (positive), $p$ approaches 1.
- When ($\beta_0 + \beta_1 x_1 + \dots$) is small (negative), $p$ approaches 0.
- When ($\beta_0 + \beta_1 x_1 + \dots) = 0, \ p=0.5$ (the decision boundary).
From Probability to Odds
Instead of working directly with probabilities, we convert them into odds. Odds represent the ratio of the probability of success to the probability of failure.
For example,
-
If a horse has a 50% chance of winning $(p=0.5)$, then the odds are:
$\frac{0.5}{1-0.5} = 1$
(The event is equally likely to happen or not happen).
-
If the probability increases to 75%, the odds are:
$\frac{0.75}{1-0.75} = 3$
(The event is 3 times more likely to happen than not).
Converting to odds does not limit the result to (0,1) as probabilities do. Additionally, it facilitates a more precise interpretation of multiplicative effects.
From Odds to Log-Odds (Logit)
To simplify calculations, we take the logarithm of the odds:
This log-odds transformation (called the logit function) maps probabilities in (0,1) to values in ($-\infty , +\infty$), making it suitable for linear modeling.
By taking the log, we can convert a multiplicative relationship into an additive one (easier to work with) and remove the constraint on probabilities, allowing for a linear equation.
For example,
- If odds = 3, then $\log(3)=1.1$
- If odds=0.5, then $\log(0.5) =-0.69$
The Logistic Regression Formula
Bringing everything together,
The transformation circle is finalized. We employ a linear model to estimate a probability, which we then map to a class label using a cutoff rule—any record with a probability exceeding this cutoff is classified as a 1. The graph below of the logic function maps a probability to a scale that is suitable for a linear model.

Mapping Probabilities to Class Labels
After estimating $p$, we use a cutoff threshold to classify records:
We can customize the cutoff:
- If false negatives are costly (e.g., disease detection), we lower the threshold (e.g., classified as 1 if $p > 0.3$).
- We raise the threshold if false positives are costly (e.g., fraud detection).
Logistic Regression vs. Linear Regression
Feature | Linear Regression | Logistic Regression |
---|---|---|
Outcome Type | Continuous (e.g., house price) | Binary (0 or 1) |
Equation | $y=b_0+b_1X_1 + \dots$ | $p=\frac{1}{e^{-(\beta_0 + \beta_1X_1 + \dots)}}$ |
Method | Ordinary Least Squares (OLS) | Maximum Likelihood Estimation (MLE) |
Interpretation | Predicts actual values | Predicts probabilities |
Output | Any real number | Probability (0 to 1) |
Fitting Logistic Regression
-
In R, the
glm
function is used to fit a logistic regression, with thefamily
parameter set tobinomial
.logistic_model <- glm(outcome ~ payment_inc_ratio + purpose_ + home_ + emp_len_ + borrower_score, data=loan_data, family='binomial') logistic_model --- Call: glm(formula = outcome ~ payment_inc_ratio + purpose_ + home_ + emp_len_ + borrower_score, family = "binomial", data = loan_data) Coefficients: (Intercept) payment_inc_ratio 1.63809 0.07974 purpose_debt_consolidation purpose_home_improvement 0.24937 0.40774 purpose_major_purchase purpose_medical 0.22963 0.51048 purpose_other purpose_small_business 0.62066 1.21526 home_OWN home_RENT 0.04833 0.15732 emp_len_ > 1 Year borrower_score -0.35673 -4.61264 Degrees of Freedom: 45341 Total (i.e. Null); 45330 Residual Null Deviance: 62860 Residual Deviance: 57510 AIC: 57540
The response variable indicates the
outcome
: it assigns a value of $0$ if the loan is paid off and $1$ if it defaults. The variablespurpose_
andhome_
represent the loan’s purpose and the borrower’s homeownership status. Like linear regression, a factor variable with $P$ levels are represented using $P – 1$ columns.The reference levels for these factors are
credit_card
andMORTGAGE
. The variableborrower_score
, ranging from $0$ to $1$, indicates a borrower’s creditworthiness, from poor to excellent. This score was generated using K-Nearest Neighbor analysis based on various other variables.
-
In Python, the
LogisticRegression
class fromsklearn.linear_model
is utilized. Thepenalty
andC
parameters help prevent overfitting through L1 or L2 regularization, which is enabled by default. We can assign a significant value toC
to fit the model without regularization. Thesolver
parameter specifies the minimizer, withliblinear
being the default option.predictors = ['payment_inc_ratio', 'purpose_', 'home_', 'emp_len_', 'borrower_score'] outcome = 'outcome' X = pd.get_dummies(loan_data[predictors], prefix='', prefix_sep='', drop_first=True) y = loan_data[outcome] logit_reg = LogisticRegression(penalty='l2', C=1e42, solver='liblinear') logit_reg.fit(X, y)
Unlike R,
scikit-learn
determines classes based on the distinct values iny
(paid off and default). The classes are sorted alphabetically within the framework, which results in coefficients in reverse order compared to those in R. Thepredict
method provides the class label, whilepredict_proba
yields the probabilities based on the sequence inlogit_reg.classes_
.
Leave a comment