The last experiment I did was about dimensionality reduction with a Principal Component and Linear Discriminant Analysis. The datasets I used were rather easy, but what happens if the data was generated by a non-linear process?
You can decide to either follow this tutorial or to try out an interactive octave script. The script includes more kernel functions and all the plots and subplots used in this tutorial. It's available at files/octave/kpca.m and starting is as easy as:
philipp@mango:~/octave$ chmod +x kpca.m philipp@mango:~/octave$ ./kpca.m
Fire up GNU Octave and let's make up an experiment! The data comes from three circles (plus some noise):
function [x,y] = gen_circle(num_points, radius, sigma)
t = linspace(0,2*pi,num_points)';
x = radius*cos(t)+randn(size(t))*sigma;
y = radius*sin(t)+randn(size(t))*sigma;
endfunction
% points per class
num_datapoints = 50;
% generate some data
[x1, y1] = gen_circle(num_datapoints, 15, 2);
[x2, y2] = gen_circle(num_datapoints, 8, 1);
[x3, y3] = gen_circle(num_datapoints, 2, 1);
% plot original data
figure;
plot(x1,y1,'ro'); hold on;
plot(x2,y2,'go');
plot(x3,y3,'bo');
title("Original Data");
% classes
y = [
repmat(0,num_datapoints,1);
repmat(1,num_datapoints,1);
repmat(2,num_datapoints,1)
];
data = [x1 y1; x2 y2; x3 y3];
And you will acknowledge that it's hard to separate between the classes with a line:
What does a plain old PCA say? By building the covariance matrix and finding it's orthogonal eigenvectors as in the last experiment, we can project the data on the first principal component:
% Principal Component Analysis mu = mean(data); data_centered = data - repmat(mu, rows(data),1); C = cov(data_centered) [evec, eval] = eig(C); % sort by eval [eval, idx] = sort(diag(eval), 'descend'); evec = evec(:, idx); % project z = data_centered*evec;
A PCA doesn't really help. Training a linear classifier on one of these projections is hard:
So if you are stuck in 2D… Go to 3D! By applying the so called Kernel trick we can transform our data into a higher dimensional feature space, where it can be separated with linear methods again. There are a bunch of cool videos on youtube explaining it, just like this one.
I've hacked this after reading Wikipedia on Kernel Principal Component Analysis, so it's probably the best page to start with. Cesar Souza also gives a great introduction and C# implementation in his blog.
Note that the KPCA is unsupervised (as is the PCA), so class labels aren't necessary. We only use them to see the results. The transformation into 3D is done by using a polynomial kernel with a degree of 2. If you wonder what the parameters are for, then here is a great summary of kernel functions: http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html.
[n, d] = size(data);
% build kernel
for i=1:n,
for j=i:n,
% polynomial kernel
K(i,j) = (data(i,:) * data(j,:)')^2;
%% gaussian perhaps?
K(j,i) = K(i,j);
end
end
The PCA requires a dataset with zero mean. So centering the dataset in the higher dimensional feature space is necessary:
% center the kernel dataset unit = ones(n, n) / n; Kc = (K - unit*K - K*unit + unit*K*unit) / n;
And finally perform a PCA:
% perform pca on the kernel [evec, eval] = eig(Kc); % sort by eval [eval, idx] = sort(real(diag(eval)), 'descend'); evec = evec(:, idx);
Don't forget to normalize the eigenvectors:
% normalize
for i = 1:n,
evec(:,i) = evec(:,i)/sqrt(eval(i));
end
And by projecting the dataset onto the first three components:
p = Kc*evec(:,1:3);
… and plotting:
% plot
p1 = p(find(y==0),:);
p2 = p(find(y==1),:);
p3 = p(find(y==2),:);
figure;
i = 1;
for ev1 = 1:3,
for ev2 = 1:3,
subplot(3,3, i); cla; hold on;
plot(p1(:,ev1),p1(:,ev2), "ro");
plot(p2(:,ev1),p2(:,ev2), "go");
plot(p3(:,ev1),p3(:,ev2), "bo");
hold off;
title(strcat("PC(", num2str(ev1),",", num2str(ev2),")"));
axis("tight")
i = i + 1;
end
end
We can separate the classes way better than before:
And if you like a cool 3D plot use:
figure;
plot3(p1(:,1),p1(:,2),p1(:,3), "ro"); hold on;
plot3(p2(:,1),p2(:,2),p2(:,3), "go");
plot3(p3(:,1),p3(:,2),p3(:,3), "bo");
title("Kernel PCA projection")
xlabel('1st component');
ylabel('2nd component');
zlabel('3rd component');
… you will see the non-linear nature of the subspace we have found:
You have seen how to do a Kernel PCA with GNU Octave. Download the script and experiment with the different Kernel functions!
The Kernel PCA has some drawbacks of course. One thing you'll notice when researching is, that the selection of the kernel function lacks a theoretic scheme (… or I didn't find one). It appears to be the same black magic that is involved in designing Neural Networks, so I am still sceptic about Kernel methods. The second problem is the construction of the Kernel matrix, which scales with the number of data points, making it infeasible to store and compute the Kernel for large-scale data sets. You can overcome this problem by using an iterative version of the algorithm, see Iterative Kernel Principal Component Analysis for Image Modeling (Paper, Project website).
Discussion
Impressing! Love from Mum and Aunt