Day63 ML Review - Ensemble Method (2)
Code Structure of Combining Classifiers via Majority Vote
Combining Classifiers via Majority Vote
Implementing a Simple Majority Vote Classifier
The algorithm we plan to implement will enable us to merge various classification algorithms with specific weights to boost confidence. We aim to create a more robust meta-classifier that offsets the individual classifiers’ shortcomings on a particular dataset. To put it more precisely in mathematical terms, we can express the weighted majority vote in the following manner.
In the given formula, $W_j$ represents the weight connected with a base classifier, $C_j$, while $\hat{y}$ stands for the predicted class label of the ensemble. A denotes the set of unique class labels, and $\chi_A$ signifies the characteristic function or indicator function, which yields 1 if the predicted class of the $j$th classifier matches $i$ ($C_j(x)=i$. To simplify the equation for equal weights, we can express it as follows.
Let’s assume we have three base classifiers to predict the class label of a given example. Two of three base classifiers predict class 0
, and one predicts class 1
. When we weigh the predictions of each base classifier equally, the majority vote predicts that the example belongs to class 0.
$\hat{y} = \text{mode} \{ 0,0,1 \} = 0$
Let’s assign a weight of 0.6 to $C_3$ and let’s weight $C_1$ and $C_2$ by a coefficient of 0.2:
$= \text{arg} \ \text{max}_i[0.2 \times i_0 + 0.2 \times i_0 + 0.6 \times i_1] = 1$
More simply, since $3 \times 0.2 = 0.6$, we can say that the prediction made by $C_3$ has three times more weight than the predictions by $C_1$ or $C_2$. This can be expressed as follows.
We can use NumPy’s argmax
and bincount
functions as following code for easier way to calculate.
>>> import numpy as np
>>> np.argmax(np.bincount([0,0,1], weights[0.2, 0.2, 0.6]))
1
If we would like to determine this with the probability of the predicted class, we can derive equations as follows.
Let’s assume that we have a binary classification problem with class labels $i \in {0,1 }$ and an ensemble of three classifiers, $C_j (j \in { 1,2,3})$. Let’s assume that the classifiers $C_j$ return the following class membership probabilities for a particular example, $x$:
Using the same weights, the probability of the individual class as follows.
$p(i_1 \vert x) = 0.2 \times 0.1 + 0.2 \times 0.2 + 0.6 \times 0.6 = 0.42$
$\hat{y} = \text{arg} \ \text{max}_i [p(i_0 \vert x), p(i_1 \vert x)] = 0$
Putting everything together, we will confirm the procedures with MajorityVoteClassifier
in Python.
Overview
The MajorityVoteClassifier
is an ensemble classifier that aggregates predictions from multiple classifiers. It supports two voting strategies:
- Hard Voting (
'classlabel'
): Each classifier votes for a class label, and the final prediction is the class with the majority of votes. - Soft Voting (
'probability'
): The class probabilities from each classifier are averaged, and the final prediction is the class with the highest average probability.
from sklearn.base import BaseEstimator
from sklearn.base import ClassifierMixin
from sklearn.preprocessing import LabelEncoder
from sklearn.base import clone
from sklearn.pipeline import _name_estimators
import numpy as np
import operator
class MajorityVoteClassifier(BaseEstimator, ClassifierMixin):
""" A majority vote ensemble classifier
Parameters
----------
classifiers: array-like, shape = [n_classifiers]
Different classifiers for the ensemble
vote: str, {'classlabel', 'probability'}
Default: 'classlabel'
If 'classlabel' the prediction is based on the argmax of class labels.
Else if 'probability', the argmax of the sum of probabilities is used to predict
the class label (recommend for calibrated classifiers)
weights: array-like, shape = [n_classifiers]
Optional, default: None
If a list of `int` or `float` values are provided, the classifiers are weighted by importance; uses uniform weights if `weights=None.`
"""
def __init__(self, classifiers, vote='classlabel', weights=None):
self.classifiers = classifiers
self.named_classifiers = {key: value for key, value in _name_estimators(classifiers)}
self.vote = vote
self.weights = weights
def fit(self, X, y):
""" Fit classifiers.
Parameters
----------
X: {array-like, sparse matrix}, shape = [n_examples, n_features]
Matrix of training examples.
y: array-like, shape = [n_examples]
Vector of target class labels.
Returns
-------
self: object
"""
if self.vote not in ('probability', 'classlabel'):
raise ValueError("vote must be 'probability'" "or 'classlabel'; got (vote=%r)"
% self.vote)
if self.weights and len(self.weights) != len(self.classifiers):
raise ValueError("Number of classifiers and weights"
"must be equal; got %d weights," "%d classifiers"
% (len(self.weights), len(self.classifiers)))
# Use LabelEncoder to ensure class labels start
# with 0, which is important for np.argmax
# call in self.predict
self.lablenc_ = LabelEncoder()
self.lablenc_.fit(y)
self.calsses_ = self.lablenc_.classes_
self.classifiers_ = []
for clf in self.classifiers:
fitted_clf = clone(clf).fit(X, self.lablenc_.transform(y))
self.classifiers_.append(fitted_clf)
return self
Explanations
1. Package Explanations
BaseEstimator
: Provides base methods likeget_params
andset_params
for parameter tuning.ClassifierMixin
: Mixin class that adds ascore
method to classifiers.LabelEncoder
: Encodes target labels with values between 0 andn_classes - 1
.clone
: Creates a deep copy of the estimator with the same parameters._name_estimators
: Utility function to generate names for estimators in a pipeline.
2. Class MajorityVoteClassifier
Parameters
vote
: Voting strategy—either'classlabel'
for hard voting or'probability'
for soft voting.weights
: Weights for each classifier; higher weights increase a classifier’s influence.
We utilized the BaseEstimator
and ClassifierMixin
parent classes to access basic functionality, which includes the get_params
and set_params
methods for setting and retrieving the classifier's parameters, as well as the score
method for calculating prediction accuracy.
Furthermore, we will incorporate the predict method to forecast the class label using a majority vote based on the class labels when initializing a new `MajorityVoteClassifier` object with `vote='classlabel'`. Alternatively, when initializing the ensemble classifier with vote='probability'
, it will predict the class label based on the class membership probabilities. Additionally, we will include a predict_proba
method to provide the average probabilities, which helps compute the area under the curve (ROC AUC) when evaluating the received operating characteristic.
def predict(self, X):
""" Predict class labels for X.
Parameters
----------
X : {array-like, sparse matrix},
Shape = [n_examples, n_features]
Matrix of training examples.
Returns
-------
maj_vote : array-like, shape = [n_examples]
Predicted class labels.
"""
if self.vote == 'probability':
maj_vote = np.armax(self.predict_proba(X), axis=1)
else: #'classlabel' vote
# Collect results from clf.predict calls
predictions = np.asarray([clf.predict(X) for clf in self.classifiers_].T
maj_vote = np.apply_along_axis
(lambda x: np.argmax(np.bin-count(x, weights=self.weights)),
axis=1, arr=predictions)
maj_vote = self.lablenc_.inverse_transform(maj_vote)
return mat_vote
def predict_proba(self, X):
""" Predict class probabilities for X.
Parameters
----------
X : {array-like, sparse matrix},
shape = [n_examples, n_features]
Training vectors, where
n_examples is the number of examples and
n_features is the number of features.
Returns
----------
avg_proba : array-like,
shape = [n_examples, n_classes]
Weighted average probability for each class per example.
"""
probas = np.asarray([clf.predict_proba(X) for clf in self.classifiers_])
avg_proba = np.average(probas, axis=0, weights=self.weights)
return avg_proba
def get_params(self, deep=True):
""" Get classifier parameter names for Grid-Search"""
if not deep:
return super(MajorityVoteClassifier, self).get_params(deep=False)
else:
out = self.named_classifiers.copy()
for name, step in self.named_classifiers.iterms():
for key, value in step.get_params(deep=True).items():
out['%s__%s' % (name, key)] = value
return out
Explanations
1. predict
Method
The method handles two voting mechanisms: soft voting (based on probabilities) and hard voting (based on class labels):
- Soft Voting (
self.vote == 'probability'
):- Calls
predict_proba()
to get the class probabilities for each classifier. - Uses
np.argmax
to predict the class with the highest average probability for each example.
- Calls
- Hard Voting (
'classlabel'
):- Collects the predictions from each classifier using
clf.predict()
. - Applies
np.bincount()
to count the votes for each class. - If
weights
are provided, each classifier’s vote is weighted according to its importance. - Uses
np.argmax
to determine the class with the most votes. - Finally,
LabelEncoder
is used to inverse-transform the predicted class labels back to their original format.
- Collects the predictions from each classifier using
- Returns:
maj_vote
: Array of predicted class labels, one for each training example
2. predict_proba
method
- Collect Class Probabilities: The method calls
clf.predict_proba()
for each classifier to collect the predicted probabilities for all classes. - Compute Weighted Average: The
np.average()
function is used to compute the weighted average of the predicted probabilities across classifiers. - The result is a 2D array of probabilities, with one row for each example and one column for each class.
3. get_params
method
- The
get_params
method is necessary for enabling GridSearchCV and similar hyperparameter optimization tools to access the parameters of each classifier within the ensemble. - Returns the dictionary of parameters, which is compatible with scikit-learn’s hyperparameter search utilities (like
GridSearchCV
).
It’s important to remember that we created a modified version of the get_params
method to access the parameters of individual classifiers in the ensemble using the _name_estimators
function. This might seem complicated initially, but it will make perfect sense when we implement a grid search for hyperparameter tuning in later sections.
Side Note
<Class Membership Probabilities from Decision Trees>
In scikit-learn, the ROC AUC score is computed using the predict_proba
method, if applicable. With decision trees, probabilities are calculated using a frequency vector created for each node during training. This vector gathers frequency values for each class label based on the distribution at that node, which is then normalized to add up to 1. Similarly, in the k-nearest neighbors algorithm, the class labels of the nearest neighbors are combined to provide normalized class label frequencies. While the normalized probabilities from decision trees and k-nearest neighbors may resemble those obtained from a logistic regression model, it's important to note that they are not derived from a probability mass function.
Leave a comment