Day53 ML Review - Dimensionality Reduction (4)
Implementing a Kernal Principal Component Analysis(KPCA) in Python
We are going to implement an RBF KPCA in Python for example.
from scipy.spatial.distance import pdist, squareform
from scipy import exp
from scipy.linalg import eigh
import numpy as np
def rbf_kernel_pca(X, gamma, n_components):
"""
RBF kernel PCA implementation:
X: {NumPy ndarray}, shape = [n_examples, n_features]
gamma (float): Tuning parameter of the RBF kernel
n_components (int): Number of principal components to return
returns projected dataset
X_pc: {NumPy ndarray}, shape = [n_examples, k_features]
"""
# Calculate pairwise squared Euclidean distances
# in the MxN dimensional dataset.
sq_dists = pdist(X. 'sqeuclidean')
# Convert pairwise distances into a square matrix
mat_sq_dists = squareform(sq_dists)
# Compute the symmetric kernel matrix.
K = exp(-gamma * mat_sq_dists)
# Center the kernel matrix.
N = K.shape[0]
one_n = np.ones((N,N)) / N
K = K - one_n.dot(K) - K.dot(one_n) + one_n.dot(K).dot(one_n)
# Obtaining eigenpairs from the centered kernel matrix
# scipy.linalg.eigh returns them in ascending order
eigvals, eigvecs = eigh(K)
eigvals, eigvecs = eigenvals[::-1], eigvecs[:, ::-1]
# Collect the top k eigenvectors (projected examples)
X_pc = np.column_stack([eigvecs[:, i] for i in range(n_components)
return X_pc
Example 1 - separating half-moon shapes
Let’s use our rbf_kernel_pca
on various nonlinear example datasets. First, we’ll generate a 2D dataset with 100 sample points that depict two half-moon shapes.
from sklearn.datasets import make_moons
X, y = make_moons(n_samples = 100, random_state=123)

In the following diagram, you can see how the dataset transforms when using Standard PCA.

The original half-moon shapes appear slightly sheared and flipped across the vertical center. This transformation would not assist a linear classifier in distinguishing between circles and triangles. Similarly, if we project the dataset onto a one-dimensional feature axis, the circles and triangles corresponding to the two half-moon shapes are not linearly separable, as shown in the right subplot.
Let’s evaluate the performance of the rbf_kernel_pca
method we implemented by following the steps below.
X_kcpa = rbf_kernel_pca(X, gamma=15, n_components=2)
The two categories (circles and triangles) are distinctly separated linearly, so we have an appropriate training dataset for linear classification.

Example 2 - separating concentric circles
We will examine another instance of concentric circles using the following code.
from sklearn.datasets import make_circles
X, y = make_circles(n_samples = 1000, random_state=123, noise=0.1, factor = 0.2)
plt.scatter(X[y == 0, 0], X[y == 0, 1], color = 'red', marker='^', alpha = 0.5)
plt.scatter(X[y == 1, 0], X[y == 1, 1], color = 'blue', marker='o', alpha = 0.5)
plt.tight_layout()
plt.show()
The data will be represented below.

Begin by using the regular PCA method to contrast it with the outcome of the RBF kernel PCA.
scikit_pca = PCA(n_components = 2)
X_spca = scikit_pca.fit_transform(X)
We can see that standard PCA cannot produce results for training a linear classifier.

When we employ the rbf_kcpa
technique, we notice that the data has been transformed into a new subspace where the two classes are now linearly separable.
X_kcpa = rbf_kernel_pca(X, gamma=15, n_components=2)
fig, ax = plt.subplots(nrows=1, ncols=2, gif-size=(7,3))
ax[0].scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(X_kpca[y==0, 0], np.zeros(500,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(X_kpca[y==1, 0], np.zeros(500,1))-0.02, color='blue', marker='o', alpha=0.5)

Leave a comment