Day36 ML Review - Decision Tree (2)

July 29, 2024 2 minute read

Information Gain (2) - Entropy & Classification Error

Maximizing IG: Getting the Best Possible Value with the Best Efficiency

We must establish an objective function to optimize through the tree learning algorithm to split the nodes effectively based on the most informative features.

2. Entropy ($I_H$)

Here is the definition of entropy for all non-empty class $P(i \mid t) \neq 0$

$I_H(t)=-\sum_{i=1}^{c}p(i \mid t)log_2p(i \mid t)$

In the context of decision trees, the term $p(i\mid t)$ represents the proportion of examples belonging to the class $i$ at a specific node, denoted as $t$. The concept of entropy measures the impurity or uncertainty at a node. It is 0 when all examples at a node belong to the same class and increases as the class distribution becomes more uniform.

Specifically, in a binary class scenario,

the entropy is 0 when $p(i=1 \mid t)=1$ or $p(i=0 \mid t)=0$.
Conversely, if the classes are evenly distributed with $p(i=1 \mid t)=0.5$ and $p(i=0 \mid t)=0.5$, the entropy is 1, indicating maximum uncertainty.

In summary, the entropy criterion in decision tree construction aims to maximize the mutual information within the tree. This reduces uncertainty and enhances the tree’s overall effectiveness in classifying new instances.

Calculation Example

Let’s assume we have a node with the following class distribution:

Class 0: 30 instances
Class 1: 70 instances

The probabilities are:

$p_0= \frac{30}{100} = 0.3 $, $\hspace{20 mm}$ $p_1=\frac{70}{100} = 0.7$

Then, the Gini impurity is:

$Gini=1−(0.3^2+0.7^2)=1−(0.09+0.49)=1−0.58=0.42$

3. Classification Error

Classification error measurement is as follows.

$I_E(t) = 1 - max\{ p(i \mid t) \} $

This criterion is useful for pruning but not recommended for growing a decision tree because it is less sensitive to changes in the class probabilities of the nodes.

Calculation Example

Again, using the same node as before:

Class 0: 30 instances
Class 1: 70 instances

The probabilities are:

$p_0 = \frac{30}{100} = 0.3$, $\hspace{20 mm}$ $p_1 = \frac{70}{100} = 0.7$

The classification error is:

$ \text{Error} = 1 - \max(0.3, 0.7) = 1 - 0.7 = 0.3 $

Summary of Metrics

Gini Impurity: Measures the impurity of a node, with lower values indicating purer nodes.
Entropy: Measures the randomness or impurity of a node, with lower values indicating purer nodes.
Classification Error: Measures the misclassification rate at a node, with lower values indicating better classification.

Each of these measures can guide the decision tree algorithm in selecting the best feature and threshold to split the data at each node, aiming to create pure or homogeneous nodes, leading to an effective decision tree.

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day36 ML Review - Decision Tree (2)

Information Gain (2) - Entropy & Classification Error

Maximizing IG: Getting the Best Possible Value with the Best Efficiency

2. Entropy ($I_H$)

Calculation Example

3. Classification Error

Calculation Example

Summary of Metrics

Share on

Leave a comment

You may also enjoy

Day175 - MLOps Review: Data Distribution Shifts And Monitoring (2)

Day174 - MLOps Review: Data Distribution Shifts And Monitoring (1)

Day173 - MLOps Review: Model Deployment And Prediction Service (3)

Day172 - MLOps Review: Model Deployment and Prediction Service (2)