I've recently pushed some code to perform face recognition with OpenCV2 into my github repository. Now if you have ever researched on face recognition you've noticed there's a gigantic number of publications, but source code is very sparse. Since I've got some positive feedback on Machine Learning with OpenCV2, I thought I write a document on:
It's the kind of guide I've wished for, when I was working myself into face recognition. It helps you with installing OpenCV2 on your machine and explains you how to build a project on Windows and Linux. The Eigenfaces and Fisherfaces method are explained in detail, prototyped with Python, GNU Octave/MATLAB and implemented with OpenCV2's C++ API. A basic knowledge of C++ is assumed and administrative rights for your machine are a plus. I've decided to use MinGW (the GCC port for Windows) as C/C++ compiler for Windows, because it works great with OpenCV2 and comes under terms of a public license (please see mingw.org/license for details). If someone checks the install guide for Visual Studio 2008/2010, I would be happy to add it to the document. I guess you only have to generate a Visual Studio project with CMake, open and compile it; but I don't have Visual Studio right now to check it.
The code and document is released under a BSD license, so feel free to use it for your commercial and academic projects. Note: the latest version of the document and code can be obtained from the projects github repository:
It's a fact, that human perception isn't built for observing fine changes in grayscale images. Human sensors (eyes and stuff) are more sensitive to observing changes between colors, so you often need to recolor your grayscale images to get a clue about them. Yesterday I needed to visualize some data in OpenCV. It's easy to apply a colormap in MATLAB, but OpenCV doesn't come with predefined colormaps. I don't have much knowledge about colormaps and I am totally off when it comes to choosing colors anyway (my former art teacher will acknowledge), but there are people with a lot more experience. One page I've found is the blog of Matteo Niccoli, who shares some colormaps on his FEX page, along with interesting links to latest research.
So I simply took his colormaps (Thanks!), some GNU Octave colormaps and wrapped them into classes. The source code is now on my github account:
Using a colormap is then as easy as writing:
#include <cv.h> #include "colormap.hpp" using namespace cv; int main(int argc, const char *argv[]) { Mat img = imread("image.jpg",0); colormap::Jet jet; // default: 256-levels Mat colored = jet(img); //... }
Here are the colormaps included in the code. You can have a look into the main.cpp to see how I've created the color scales below:
| Name | Scale |
|---|---|
| Autumn | |
| Bone | |
| Cool | |
| Hot | |
| HSV | |
| Jet | |
| MKPJ1 | |
| MKPJ2 | |
| Ocean | |
| Pink | |
| Rainbow | |
| Spring | |
| Summer | |
| Winter | |
Please note, that the ColorMap class returns a CV_32FC3 matrix (float values from (0.0,…,1.0)). In order to display an image you have to convert the scale:
// ... Mat colored = jet(img); colored.convertTo(colored, CV_8UC3, 255.0);
I hope this saves someone some time. I've put my code under a BSD License, so feel free to use it and enhance it with your own colormaps!
I've found some minutes to implement the Fisherfaces method with OpenCV2's C++ API. The code has been pushed into my github repository at:
As some kind of documentation, I am pasting the README.markdown coming with the project.
If you still have any troubles using it, just comment below or drop me a mail.
This project implements the Fisherfaces method as described in: P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720, 1997.
This project now comes with the Eigen3 headers, so compiling the project is as easy as writing (assuming you are in this folder):
philipp@mango:~/some/dir/lda$ mkdir build philipp@mango:~/some/dir/lda$ cd build philipp@mango:~/some/dir/lda/build$ cmake .. philipp@mango:~/some/dir/lda/build$ make philipp@mango:~/some/dir/lda/build$ ./lda filename.ext
And if you are in Windows using MinGW it may look like this:
C:\some\dir\lda> mkdir build C:\some\dir\lda> cd build C:\some\dir\lda\build> cmake -G "MinGW Makefiles" .. C:\some\dir\lda\build> mingw32-make C:\some\dir\lda\build> lda.exe filename.ext
You probably have to set the OpenCV_DIR variable if it wasn't added by your installation, see Line 5 in the CMakeLists.txt how to do this. If you have problems working with CMake or installing OpenCV, you probably want to read my guide on Face Recognition with OpenCV2.
The project comes with an example, please have a look at the main.cpp on how to use the classes. You need some data to make the examples work, sorry but I really can't include those face databases in my repository. I have thoroughly commented the code and reworked it lately, to make its usage simpler. So if anything regarding the classes is unclear, please read the comments.
In the example I use a CSV file to read in the data, it's the easiest solution I can think of right now. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a filename followed by a ; followed by the label (as integer number), making up a line like this: /path/to/image.ext;0.
Think of the _label_ as the subject (the person) this image belongs to, so same subjects (persons) should have the same label. An example CSV file for the AT&T Facedatabase is given here, which looks like this (assuming I've extracted the database to /home/philipp/facerec/data/at, file is without … of course):
/home/philipp/facerec/data/at/s1/1.pgm;0 /home/philipp/facerec/data/at/s1/2.pgm;0 ... /home/philipp/facerec/data/at/s2/1.pgm;1 /home/philipp/facerec/data/at/s2/2.pgm;1 ... /home/philipp/facerec/data/at/s40/1.pgm;39 /home/philipp/facerec/data/at/s40/2.pgm;39
Once you have a CSV file with valid filenames and labels, you can run the demo by simply starting the demo with the path to the CSV file as parameter:
./lda /path/to/your/csvfile.ext
Or if you are in Windows:
lda.exe /path/to/your/csvfile.ext
All code is put under a BSD license, so feel free to use it for your projects.
There was a question on stackoverflow on how to calculate the Fisherfaces with OpenCV. So I wrote a class to perform a Linear Discriminant Analysis with OpenCV, which is available at:
Note: This post has been updated to fisherfaces_in_opencv.
My website recently saw a lot of hits for a tiny wiki page on Local Binary Patterns, so I thought it might be useful to share some code. I am preparing a long blog post on Local Binary Patterns (LBP), but you know it… 80% done and 20% left to do… Damn Pareto strikes again! For now I'll make the long story a bit shorter.
All source code in this post is available at https://github.com/bytefish/opencv/tree/master/lbp. It comes as a CMake project, so to build and run it simply type:
philipp@mango:~/some/dir$ mkdir build; cd build philipp@mango:~/some/dir/build$ cmake .. philipp@mango:~/some/dir/build$ make philipp@mango:~/some/dir/build$ ./lbp <device id (default 0)>
Note that I am using the classic OpenCV type system in my code snippets, because the new type system is not widespread at time of writing this. If you are in desperate need of plain C or Python versions, then just comment below or drop me a mail. This should give you an idea for speeding up the Python implementations. And if I forgot anything important, I am sure the great Scholarpedia page on Local Binary Patterns answers your questions.
Oh one final important edit. Thanks to my 4 loyal Google Reader subscribers and all the anonymous RSS feed readers!
If you have read my blog posts on Fisherfaces or Eigenfaces I've left you with the impression everything works just fine. You should have been sceptic. If you've ever developed algorithms for real life you know one thing for sure: never expect a perfect world. Let's revisit the algorithms we've discussed. Both the Fisherfaces and Eigenfaces method take a somewhat holistic approach to face recognition. You treat your data as a vector somewhere in a high-dimensional image space. We all know high-dimensionality is crap, so a lower-dimensional subspace is identified, where (probably) useful information is preserved. We've already seen that the Eigenfaces approach is likely to find the wrong components on images with a lot of variation in illumination, because components with a maximum variance over all classes aren't necessarily useful for classification. So to preserve some discriminative information we applied a Linear Discriminant Analysis (with the Fisher criterion) and optimized as described in the Fisherfaces method. The Fisherfaces method worked great… at least for the very constrained scenario we've assumed in our model.
Now real life isn't perfect. You simply can't guarantee perfect light settings in your images or 10 images of a person. So what if there's only one image for each person? Our covariance estimates for the subspace will be horribly wrong, something also known as the Small Sample Size Problem. Remember the Eigenfaces method had a 96% recognition rate on the AT&T Facedatabase? How many images do we actually need to get such useful estimates? I've put a tiny script for you into the appendix, feel free to experiment with. Running the script on the AT&T Facedatabase, which is a fairly easy image database, shows:
So in order to get good recognition rates you'll need at least 8(+-1) images for each person and the Fisherfaces method doesn't really help here; somewhat logical when the subspace we project our data into is identified by a PCA. You can find similar figures in often cited papers.
All this is known for ages and you can find tons of literature discussing these problems. Some papers try to regulate the Discriminant Analysis or use the pseudo-inverse within-scatter matrix, just search Google Scholar on Small Sample Size. Now this is just my personal and totally biased1) take on this: I think the error you should expect with this kind of models is way too high (and I haven't seen an implementation yet). Other solutions instead assume a face model and synthesize images from a given sample. One thing all solutions have in common is, that they are complicated.
So some research concentrated on extracting local features from images. The idea is to not look at the whole image as a high-dimensional vector, but describe only local features of an object. The features you extract this way will have a low-dimensionality implicitly. A fine idea! But you'll soon observe the image representation we are given doesn't only suffer from illumination variations. Think of things like scale, translation or rotation in images - your local description has to be at least a bit robust against those things.
You've seen those cool SIFT Features in OpenCV? The algorithm extracts local keypoints in your image that doesn't mind the scale, one of the reasons why it's called Scale-invariant feature transform. Just like SIFT, the Local Binary Patterns methodology has its roots in 2D texture analysis. The basic idea is to summarize the local structure in an image by comparing each pixel with its neighborhood. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater-equal its neighbor, then denote it with 1 and 0 if not. You'll end up with a binary number for each pixel, just like 11001111. With 8 surrounding pixels you'll end up with 2^8 possible combinations, which are called Local Binary Patterns or sometimes LBP codes. The first LBP operator actually used a fixed 3 x 3 neighborhood just like this:
So why is this description so great? Because it's dead simple to implement and fast as hell. I'll just name it Original LBP in code:
template <typename _Tp> void lbp::OLBP_(const Mat& src, Mat& dst) { dst = Mat::zeros(src.rows-2, src.cols-2, CV_8UC1); for(int i=1;i<src.rows-1;i++) { for(int j=1;j<src.cols-1;j++) { _Tp center = src.at<_Tp>(i,j); unsigned char code = 0; code |= (src.at<_Tp>(i-1,j-1) > center) << 7; code |= (src.at<_Tp>(i-1,j) > center) << 6; code |= (src.at<_Tp>(i-1,j+1) > center) << 5; code |= (src.at<_Tp>(i,j+1) > center) << 4; code |= (src.at<_Tp>(i+1,j+1) > center) << 3; code |= (src.at<_Tp>(i+1,j) > center) << 2; code |= (src.at<_Tp>(i+1,j-1) > center) << 1; code |= (src.at<_Tp>(i,j-1) > center) << 0; dst.at<unsigned char>(i-1,j-1) = code; } } }
This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood. The idea is to align an abritrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods:
The operator is an extension to the original LBP codes, so it gets called the Extended LBP (sometimes also referred to as Circular LBP) . If a points coordinate on the circle doesn't correspond to image coordinates, the point get's interpolated. Computer science has a bunch of clever interpolation schemes, I'll simply do a bilinear interpolation:
template <typename _Tp> void lbp::ELBP_(const Mat& src, Mat& dst, int radius, int neighbors) { neighbors = max(min(neighbors,31),1); // set bounds... // Note: alternatively you can switch to the new OpenCV Mat_ // type system to define an unsigned int matrix... I am probably // mistaken here, but I didn't see an unsigned int representation // in OpenCV's classic typesystem... dst = Mat::zeros(src.rows-2*radius, src.cols-2*radius, CV_32SC1); for(int n=0; n<neighbors; n++) { // sample points float x = static_cast<float>(radius) * cos(2.0*M_PI*n/static_cast<float>(neighbors)); float y = static_cast<float>(radius) * -sin(2.0*M_PI*n/static_cast<float>(neighbors)); // relative indices int fx = static_cast<int>(floor(x)); int fy = static_cast<int>(floor(y)); int cx = static_cast<int>(ceil(x)); int cy = static_cast<int>(ceil(y)); // fractional part float ty = y - fy; float tx = x - fx; // set interpolation weights float w1 = (1 - tx) * (1 - ty); float w2 = tx * (1 - ty); float w3 = (1 - tx) * ty; float w4 = tx * ty; // iterate through your data for(int i=radius; i < src.rows-radius;i++) { for(int j=radius;j < src.cols-radius;j++) { float t = w1*src.at<_Tp>(i+fy,j+fx) + w2*src.at<_Tp>(i+fy,j+cx) + w3*src.at<_Tp>(i+cy,j+fx) + w4*src.at<_Tp>(i+cy,j+cx); // we are dealing with floating point precision, so add some little tolerance dst.at<unsigned int>(i-radius,j-radius) += ((t > src.at<_Tp>(i,j)) && (abs(t-src.at<_Tp>(i,j)) > std::numeric_limits<float>::epsilon())) << n; } } } }
Now using an abritrary radius and sample points has two effects. With an educated guess I would say… The more sampling points you take, the more patterns you can encode, the more discriminative power you have, but the higher the computational effort. Instead the larger the radius, the smoother the image, the larger details can be captured, the less discriminative power the description has (if you don't increase the sampling points at the same time). Let's see what the LBP codes look like given this sample frame. My webcam isn't really high-resolution, please don't laugh at my hardware!
| Radius | Sampling Points | LBP Image |
|---|---|---|
| 1 | 4 | |
| 1 | 8 | |
| 1 | 16 | |
| 2 | 4 | |
| 2 | 8 | |
| 2 | 16 | |
| 4 | 4 | |
| 4 | 8 | |
| 4 | 16 | |
Now what's left is how to classify an object. One idea is to create a histogram over the entire image and search for the nearest neighbor in a database, (numPatterns is the amount of possible LBP codes for your current operator - hey… feel free to make all this more OOP-ish):
template <typename _Tp> void lbp::histogram_(const Mat& src, Mat& hist, int numPatterns) { hist = Mat::zeros(1, numPatterns, CV_32SC1); for(int i = 0; i < src.rows; i++) { for(int j = 0; j < src.cols; j++) { int bin = src.at<_Tp>(i,j); hist.at<int>(0,bin) += 1; } } } void lbp::histogram(const Mat& src, Mat& hist, int numPatterns) { switch(src.type()) { case CV_8SC1: histogram_<char>(src, hist, numPatterns); break; case CV_8UC1: histogram_<unsigned char>(src, hist, numPatterns); break; case CV_16SC1: histogram_<short>(src, hist, numPatterns); break; case CV_16UC1: histogram_<unsigned short>(src, hist, numPatterns); break; case CV_32SC1: histogram_<int>(src, hist, numPatterns); break; } }
To perform the actual classification a simple k-nearest neighbor query can be employed. A common distance measure between histograms is given with the Chi-Square Distance:
template <typename _Tp> double lbp::chi_square_(const Mat& histogram0, const Mat& histogram1) { if(histogram0.type() != histogram1.type()) CV_Error(CV_StsBadArg, "Histograms must be of equal type."); if(histogram0.rows != 1 || histogram0.rows != histogram1.rows || histogram0.cols != histogram1.cols) CV_Error(CV_StsBadArg, "Histograms must be of equal dimension."); double result = 0.0; for(int i=0; i < histogram0.cols; i++) { double a = histogram0.at<_Tp>(0,i) - histogram1.at<_Tp>(0,i); double b = histogram0.at<_Tp>(0,i) + histogram1.at<_Tp>(0,i); if(abs(b) > numeric_limits<double>::epsilon()) { result+=(a*a)/b; } } return result; } double lbp::chi_square(const Mat& histogram0, const Mat& histogram1) { switch(histogram0.type()) { case CV_8SC1: return chi_square_<char>(histogram0,histogram1); break; case CV_8UC1: return chi_square_<unsigned char>(histogram0,histogram1); break; case CV_16SC1: return chi_square_<short>(histogram0, histogram1); break; case CV_16UC1: return chi_square_<unsigned short>(histogram0,histogram1); break; case CV_32SC1: return chi_square_<int>(histogram0,histogram1); break; } }
By throwing everything into a single histogram all spatial information is discarded. In tasks like face detection (and a lot of other pattern recognition problems) spatial information is very useful, so it has to be incorporated into the histogram somehow. The representation proposed by Ahonen et al. in Face Recognition with Local Binary Patterns is to divide the LBP image into grids and build a histogram of each cell seperately. Then by concatenating the histograms the spatial information is encoded (not merging them), just like this:
This informal description translates to OpenCV as:
void lbp::spatial_histogram(const Mat& src, Mat& hist, int numPatterns, const Size& window, int overlap) { int width = src.cols; int height = src.rows; vector<Mat> histograms; for(int x=0; x < width - window.width; x+=(window.width-overlap)) { for(int y=0; y < height-window.height; y+=(window.height-overlap)) { Mat cell = Mat(src, Rect(x,y,window.width, window.height)); histograms.push_back(histogram(cell, numPatterns)); } } hist.create(1, histograms.size()*numPatterns, CV_32SC1); // i know this is a bit lame now... feel free to make this a bit more efficient... for(int histIdx=0; histIdx < histograms.size(); histIdx++) { for(int valIdx = 0; valIdx < numPatterns; valIdx++) { int y = histIdx*numPatterns+valIdx; hist.at<int>(0,y) = histograms[histIdx].at<int>(valIdx); } } } void lbp::spatial_histogram(const Mat& src, Mat& dst, int numPatterns, int gridx, int gridy, int overlap) { int width = static_cast<int>(floor(src.cols/gridx)); int height = static_cast<int>(floor(src.rows / gridy)); spatial_histogram(src, dst, numPatterns, Size_<int>(width, height), overlap); }
The Nearest Neighbor matching is not yet included in the github repository. The final post will explain a flexible matching approach using Local Binary Patterns and machine learning algorithms. For now feel free to use the code and have fun experimenting with it. Oh, final words on the keys in the application:
| Key | Function |
|---|---|
| o | switch the lbp operator |
| p | decrease the number sampling points |
| P | increase number of sampling points |
| r | decrease radius |
| R | increase radius |
| q,Q,ESC | quits the program |
To run the experiment (with framework version version number 0.1) simply start small_sample.py from the project directory with:
philipp@mango:~/some/folder/facerec/py/facerec$ python -m scripts.small_sample
Note: in the script you have to set the path to the image database.
I had another 10 minutes for my small archiving project this morning. Turns out I should have listened to my 7th grade2) math teacher: Trigonometry is actually important. And now that I have worked myself through all “Trigonometry for kids” pages google shows up, it turns out I don't need all this for OpenCV.
You basically only need getRotationMatrix2D to get the rotation matrix for the affine transformation and getRectSubPix to crop the rotated image. Thanks a lot to: http://felix.abecassis.me/2011/10/opencv-rotation-deskewing/, it would have taken me a lot more minutes if I didn't see getRectSubPix.
In OpenCV this translates to (rest of code doesn't change):
// [...] for (int i = 0; i < contours.size(); i++) { // first calculate the area to remove small areas double area = contourArea(contours[i]); // you probably have to adjust the size of the area // as it depends on the size of the rectangle if (area > 10000) { // get the bounding box of this contour RotatedRect rect = minAreaRect(contours[i]); // matrices we'll use Mat M, rotated, cropped; // get angle and size from the bounding box float angle = rect.angle; Size rect_size = rect.size; // thanks to http://felix.abecassis.me/2011/10/opencv-rotation-deskewing/ if (rect.angle < -45.) { angle += 90.0; swap(rect_size.width, rect_size.height); } // http://opencv.willowgarage.com/documentation/cpp/geometric_image_transformations.html#cv-getrotationmatrix2d M = getRotationMatrix2D(rect.center, angle, 1.0); // http://opencv.willowgarage.com/documentation/cpp/geometric_image_transformations.html#cv-warpaffine warpAffine(src, rotated, M, src.size(), INTER_CUBIC); // yes yes, just a little scroll above getRotationMatrix2D: // http://opencv.willowgarage.com/documentation/cpp/geometric_image_transformations.html#cv-getrectsubpix getRectSubPix(rotated, rect_size, rect.center, cropped); imshow("cropped", cropped); waitKey(0); // write to file //std::stringstream ssfilename; //ssfilename << filename << i << ".jpg"; //imwrite(ssfilename.str(), cropped); } } [...]
and you end up with your extracted pictures:
| |
| |
So next thing is to put all this into a GUI! Forgot to mention I hacked a small Python Tkinter application to manually crop and rotate images: files/python/cropper.py.
I am currently in the process of archiving hundreds of family pictures, before they get lost and forgotten. It's easy with digital images in the post 2000 era, but photos from the 80s and 90s need to be scanned. While there's no cheap way to automate this (someone wants to buy me an ASIMO?), a lot of other tasks can be automated. And more importantly: it's a great chance to play around with computer vision and machine learning algorithms.
I'll share all my scripts and ideas with you, maybe they're useful for you. I am not halfway finished with the application and entire post, but sharing pieces of code may be helpful for somebody. I'll put everything under a BSD license, so feel free to edit.
Imagine I got this scan (click here for the full image):
My algorithm works like this:
With OpenCV C++ this translates to:
#include <cv.h> #include <highgui.h> #include <iostream> #include <vector> using namespace std; using namespace cv; int main(int argc, char *argv[]) { Mat srcg, dst, src = imread("scan.jpg"); // convert to grayscale cvtColor(src, srcg, CV_BGR2GRAY); // find edges Canny(srcg, dst, 50, 200, 3); // close gaps dilate(dst, dst, Mat(), Point(-1,-1)); erode(dst,dst,Mat(),Point(-1,-1)); // find the contours vector< vector<Point> > contours; findContours(dst, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE); for(int i = 0; i < contours.size(); i++) { // first calculate the area to remove small areas double area = contourArea(contours[i]); // you probably have to adjust the size of the rectangle // as it depends on the size of the rectangle if(area > 10000) { RotatedRect r = minAreaRect(contours[i]); cout << area << " (angle=" << r.angle << ")" << endl; Point2f box[4]; r.points(box); for(int i = 0; i < 4; i++) { circle(src, box[i] , 5, cvScalar(0, 0, 255)); } } } imshow("src", src); imwrite("result.jpg",src); waitKey(0); }
this outputs:
64146 (angle=-0.686144) 64401 (angle=-89.132) 64507.5 (angle=-0.68206) 52614.5 (angle=-87.4886)
and the resulting image has the corner points as:
Yesterday I was asked how to extract a contour from a given image in OpenCV. Here is an example. Imagine we got this tasty apple and we want to put it in another image (with a green background):
One solution is to first detect the edges of the apple with a Canny filter, then find the contours with OpenCV's findContours and create a mask with drawContours:
Finally copy the masked original image to the new image, which means only the areas of the contours will be copied… and you are done. You end up with a tasty apple on a green background:
In code this translates to (also on github):
#include "cv.h" #include "highgui.h" using namespace cv; using namespace std; int main() { // read in the apple (change path to the file) Mat img0 = imread("/home/philipp/images/apple.jpg", 1); Mat img1; cvtColor(img0, img1, CV_RGB2GRAY); // apply your filter Canny(img1, img1, 100, 200); // find the contours vector< vector<Point> > contours; findContours(img1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE); // you could also reuse img1 here Mat mask = Mat::zeros(img1.rows, img1.cols, CV_8UC1); // CV_FILLED fills the connected components found drawContours(mask, contours, -1, Scalar(255), CV_FILLED); /* Before drawing all contours you could also decide to only draw the contour of the largest connected component found. Here's some commented out code how to do that: */ // vector<double> areas(contours.size()); // for(int i = 0; i < contours.size(); i++) // areas[i] = contourArea(Mat(contour[i])); // double max; // Point maxPosition; // minMaxLoc(Mat(areas),0,&max,0,&maxPosition); // drawContours(mask, contours, maxPosition.y, Scalar(1), CV_FILLED); // let's create a new image now Mat crop(img0.rows, img0.cols, CV_8UC3); // set background to green crop.setTo(Scalar(0,255,0)); // and copy the magic apple img0.copyTo(crop, mask); // show the images imshow("original", img0); imshow("canny", img1); imshow("cropped", crop); // imwrite("/home/philipp/apple_canny.jpg", img1); // imwrite("/home/philipp/apple_mask.jpg", mask); // imwrite("/home/philipp/apple_cropped.jpg", crop); waitKey(); return 0; }
My last post was very long and I promise to keep this one short. In this post I want to do gender classification on a set of face images and find out which are the specific features faces differ in.
I have used the celebrity dataset from my last post and downloaded images for Emma Watson, Naomi Watts and Jennifer Lopez. So the database has 8 male and 5 female subjects, each with 10 images. All images were chosen to have a frontal face perspective and have been cropped to a size of 140×140 pixels, just like this set of images. You can easily find some images yourself using Google Images, then locate the eye positions and use the crop function to obtain the final image (see batch_crop.m for an example).
Get the code from github by either cloning the repository:
$ git clone git@github.com:bytefish/facerec.git
or downloading the latest version as tar or zip.
Then startup Octave and make the functions available:
$ cd facerec/m
$ octave
octave> addpath(genpath("."));
octave> fun_fisherface = @(X,y) fisherfaces(X,y); % no parameters needed
octave> fun_predict = @(model,Xtest) fisherfaces_predict(model,Xtest,1); % 1-NN
A Leave-One-Out Cross validation shows that the Fisherfaces method can very accurately predict, wether a given face is male or female:
octave> [X y width height names] = read_images("/home/philipp/facerec/data/gender/");
octave> cv0 = LeaveOneOutCV(X,y,fun_fisherface,fun_predict,1)
129 1 0 0
There are 129 correct predictions and only 1 false prediction, which is a recognition rate of 99%.
The results we got were very promising, but they are probably misleading. Have a look at the way a leave-one-out and k-fold cross-validation splits a Dataset D with 3 classes each with 3 observations:
o0 o1 o2 o0 o1 o2 o0 o1 o2
c0 | A B B | c0 | B A B | c0 | B B A |
c1 | A B B | c1 | B A B | c1 | B B A |
c2 | A B B | c2 | B A B | c2 | B B A |
Allthough the folds are not overlapping (training data is never used for testing) the training set contains images of persons we want to know the gender from. So the prediction may depend on the subject and the method finds the closest match to a persons image, but not the gender. What we aim for is a split by class:
o0 o1 o2 o0 o1 o2 o0 o1 o2
c0 | A A A | c0 | B B B | c0 | B B B |
c1 | B B B | c1 | A A A | c1 | B B B |
c2 | B B B | c2 | B B B | c2 | A A A |
With this strategy the cross-validation becomes subject-independent, because images of a subject are never used for learning the model. We restructure our folder hierarchy, so that it now reads: «group»/«subject»/«image»:
philipp@mango:~/facerec/data/gender$ tree .
.
|-- female
| |-- crop_angelina_jolie
| | |-- crop_angelina_jolie_01.jpg
| | |-- crop_angelina_jolie_02.jpg
| | |-- crop_angelina_jolie_03.jpg
[...]
`-- male
|-- crop_arnold_schwarzenegger
| |-- crop_arnold_schwarzenegger_01.jpg
| |-- crop_arnold_schwarzenegger_02.jpg
| |-- crop_arnold_schwarzenegger_03.jpg
[...]
15 directories, 130 files
The function for reading the dataset needs to return both, the group and the subject of each observation. The cross-validation then predicts on the groups and leaves one subject out in each iteration. read_images.m can be easily extended to read_groups.m and a simple subject-independent cross-validation is quickly written.
On a subject-independent cross-validation the Fisherfaces method is able to predict 128 out of 130 gender correctly, which is a recognition rate of 98%:
octave> [X y g group_names subject_names width height] = read_groups("/home/philipp/facerec/data/gender/");
octave> cv0 = LeaveOneClassOutCV(X,y,g,fun_fisherface,fun_predict,1)
cv0 =
128 2 0 0
Let's have a look at the features identified by the Discriminant Analysis. Since it's a binary classification, there's only one Fisherface. We'll plot it with:
fisherface = fisherfaces(X,g);
fig = figure('visible','off');
imagesc(cvtGray(fisherface.W,width,height));
% remove offsets, set to pixels and scale accordingly
set(gca,'units','normalized','Position',[0 0 1 1]);
set(gcf,'units','pixels','position',[0 0 width height]);
print("-dpng",sprintf("-S%d,%d",width,height),"fisherface.png");
and the Fisherface reveals:
Great surprise: it's the eyes and eyebrows the genders differ in… probably there's also a bit of mouth. We can conclude, that given a preprocessed face the Fisherfaces method can very accurately predict the gender of a given face. The recognition rate for a subject-independent cross-validation was 98.5%, which is only slightly below the 99.2% for a subject-dependent validation.
Of course, the experimental results should be validated on larger datasets.
Some time ago I have written a post on Linear Discriminant Analysis, a statistical method often used for dimensionality reduction and classification. It was invented by the great statistician Sir R. A. Fisher, who successfully used it for classifying flowers in his 1936 paper “The use of multiple measurements in taxonomic problems”.3) But why do we need another dimensionality reduction method, if the Principal Component Analysis (PCA) did such a good job? Well, the PCA finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn't consider any classes and so a lot of discriminative information may be lost when throwing some components away. This can yield bad results, especially when it comes to classification. In order to find a combination of features that separates best between classes the Linear Discriminant Analysis instead maximizes the ration of between-classes to within-classes scatter. The idea is, that same classes should cluster tightly together.
This was also recognized by Belhumeur, Hespanha and Kriegman and so they applied a Discriminant Analysis to face recognition in their paper “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection” (1997). The Eigenfaces approach by Pentland and Turk as described in “Eigenfaces for Recognition” (1991) was a revolutionary one, but the original paper already discusses the negative effects of images with changes in background, light and perspective. So on datasets with differences in the setup, the Principal Component Analysis is likely to find the wrong components for classification and can perform poorly.
You seldomly see something explained with code and examples, so I thought I'll change that.4) I have prepared a Python and GNU Octave version, so you should be able to easily port the code to MATLAB (probably runs out of the box). The code is available in my github repository at: https://www.github.com/bytefish/facerec and is under a BSD License. This post is based on version number 0.1.
A single page version of this post without navigational elements is available here.
The Fisherfaces algorithm we are going to implement basically goes like this:
For a detailed description of the algorithms used in this post, please refer to the paper in further_reading.
Enough talk, show me the code! I will focus on the Python version, because it's a lot cleaner. Don't get me wrong, most of the functions are also implemented in GNU Octave and it's also easy to use. Aside from architectural aspects, translating the core algorithms from Octave to Python was almost trivial. Thanks again to NumPy and matplotlib, which make Python feel like GNU Octave. This is great due to the fact, that OpenCV uses NumPy arrays since OpenCV2.2. That makes it possible to integrate NumPy projects into an OpenCV project without any hassle – rapid prototyping comes for free!
Since the Fisherfaces method performs a Principal Component Analysis and Linear Discriminant Analysis on the data, all we need are two classes to perform a PCA and LDA. The Fisherfaces method then essentially boils down to (comments stripped off):
class Fisherfaces(Model): def __init__(self, X=None, y=None, num_components=None, k=1): Model.__init__(self, name="Fisherfaces") self.k = k self.num_components = num_components try: self.compute(X,y) except: pass def compute(self, X, y): """ Compute the Fisherfaces as described in [BHK1997]: Wpcafld = Wpca * Wfld. Args: X [dim x num_data] input data y [1 x num_data] classes """ n = len(y) c = len(np.unique(y)) pca = PCA(X, n-c) lda = LDA(pca.project(X), y, self.num_components) self._eigenvectors = pca.W*lda.W self.num_components = lda.num_components self.P = self.project(X) self.y = y def predict(self, X): Q = self.project(X) return NearestNeighbor.predict(self.P, Q, self.y, k=self.k) def project(self, X): return np.dot(self._eigenvectors.T, X) def reconstruct(self, X): return np.dot(self._eigenvectors, X)
All models in the code derive from a generic Model5) and implement a compute and predict method, which are used to estimate the performance of a classifier with a cross validation. Classes for performing a leave-one-out and k-fold cross validation are implemented. I only measure the recognition rate of a classifier, so something for your use case might be missing. Feel free to add your measures in the validation module.
The filereader module only has a FilesystemReader right now. It loads images in a given path, assuming a folder structure of «path»/«subject»/«image» for reading in the dataset. There are some limitations, which will be fixed in later versions. For now you must ensure, that all images have equal size and the image format is supported by PIL (same applies for ImageMagick in GNU Octave).
Last but not least, there's a visualization module. I don't feel afraid to say: it feels very, very hardcoded. Plotting and subplotting eigenvectors is supported, so is comparing accuracy (+ standard deviation) vs. parameters for a bunch of validators (errorbar plot). Absolutely everything regarding the style of a plot is missing, so it will take a lot more iterations to create smoother plotting facilities. If you want to quickly create some plots, go for the Octave version for now.
Everything is implemented? So let the experiments begin! The last face recognition experiment I have posted was performed on the AT&T Face database. Now if you spider through some literature you will notice, that it's a rather easy dataset. Images are taken in a very controlled environment with only slightly changes in perspective and light. You won't see any great improvements with a Discriminant Analysis, because it's hard to beat the 96% accuracy we got with Eigenfaces.
That's why I will validate the algorithms on the Yale Facedatabase A. Bad thing is, that it is not available for download anymore, because the original server seems to be down. You can find some sites mirroring it (like the MIT), but I can't make guarantees about the integrity. I might add an experiment for the Extended Yale Facedatabase B soon, but the database size is too large for our small experiments.
The second experiment is something I wanted to do for a long time: we will see how faces of celebrities cluster together and I can finally project my face to where it belongs (hopefully I belong to the Brad Pitt cluster)! You'll see dramatic improvements with a Linear Discriminant Analysis in this experiment.
My experimental validation scheme is very easy. I will do n runs for each experiment and each run is validated with a k-fold Cross-validation on a shuffled training set. From these n runs the average accuracy and standard deviation is reported. In literature you'll often find the results of a Leave-One-Out Cross Validation, so I'll also report it.
For a good writeup on cross validation based model selection, see Andrew Moore's slides on Cross-validation for preventing and detecting overfitting.
Cross-… what? Imagine I want to estimate the error of my face recognition algorithm. I have captured images of people over a long period of time and now I am building a training and test set from these images. Somehow I have managed to put all happy looking people in the training set, while all people in my test set are terribly sad with their head hanging down somewhere. Should I trust the results I get from this evaluation? Probably not, because it's a really, really unfortunate split of my available data. A cross-validation avoids such splits, because you build k (non-overlapping) training and test datasets from your (shuffled) original dataset. So what's a fold then? It's best explained with an example.
For a dataset D with 3 classes {c0,c1,c2} each having 4 observations {o0,o1,o2,o3}, a 4-fold cross validation produces the following four folds F:
(F1)
o0 o1 o2 o3
c0 | A B B B |
c1 | A B B B |
c2 | A B B B |
A = {D[c0][o0], D[c1][o0], D[c2][o0]}
B = D\A
(F2)
o0 o1 o2 o3
c0 | B A B B |
c1 | B A B B |
c2 | B A B B |
A = {D[c0][o1], D[c1][o1], D[c2][o1]}
B = D\A
(F3)
o0 o1 o2 o3
c0 | B B A B |
c1 | B B A B |
c2 | B B A B |
A = {D[c0][o2], D[c1][o2], D[c2][o2]}
B = D\A
(F4)
o0 o1 o2 o3
c0 | B B B A |
c1 | B B B A |
c2 | B B B A |
A = {D[c0][o3], D[c1][o3], D[c2][o3]}
B = D\A
From these folds the accuracy and the standard deviation can be calculated. Please have a look at the Python or GNU Octave implementations if you have troubles implementing it. It's pretty self-explaining.
A Leave-one-out cross-validation (LOOCV) is the degenerate case of a k-fold cross-validation, where k is chosen to be the number of observations in the dataset. Simply put, you use one observation for testing your model and the remaining data for training the model. This results in as many experiments as observations available, which may be too expensive for large datasets.
For the same example as above you would end up with the following 12 folds F in a LOOCV:
(F1)
o0 o1 o2 o3
c0 | A B B B |
c1 | B B B B |
c2 | B B B B |
A = {D[c0][o0]}
B = D\A
(F2)
o0 o1 o2 o3
c0 | B A B B |
c1 | B B B B |
c2 | B B B B |
A = {D[c0][o1]}
B = D\A
(...)
(F12)
o0 o1 o2 o3
c0 | B B B B |
c1 | B B B B |
c2 | B B B A |
A = {D[c2][o3]}
B = D\A
The first database I will evaluate is the Yale Facedatabase A. It consists of 15 people (14 male, 1 female) each with 11 grayscale images sized 320×243px. There are changes in the light conditions (center light, left light, right light), background and in facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses).
Unfortunately the images in this database are not aligned, so the faces may be at different positions in the image. For the Eigenfaces and Fisherfaces method the images need to be preprocessed, since both methods are sensible to rotation and scale. You don't want to do this manually, so I have written a function crop for this (you can translate this to Python if you want), which has the following description:
function crop(filename, eye0, eye1, top, left, dsize)
%% Rotates an image around the eyes, resizes to destination size.
%%
%% Assumes a filename is structured as ".../subject/image.extension", so that
%% images are saved in a the folder (relative to where you call the script):
%% "crop_<subject>/crop_<filename>"
%%
%% Args:
%% filename: Image to preprocess.
%% eye0: [x,y] position of the left eye
%% eye1: [x,y] position of the right eye
%% top: percentage of offset above the eyes
%% left: percentage of horizontal offset
%% dsize: [width, height] of destination filesize
%%
%% Example:
%% crop("/path/to/file", [300, 100], [380, 110], 0.3, 0.2, [70,70])
%%
%% Leaves 21px offset above the eyes (0.3*70px) and 28px horizontal
%% offset. 20% of the image to the left eye, 20% to the right eye
%% (2*0.2*70=28px).
I specify the dimensions of the cropped image to be (100,130) for this database. top is the vertical offset, which specifies how much of the image is above the eyes, 40% seems to be a good value for these images. left specifies the horizontal offset, how much space is left from an eye to the border, 30% is a good value. The image is then rotated by the eye positions. If it's not desired to align all images at the eyes, scale or rotate them, please adjust the script to your needs. Here is a script to crop images with filename and coordinates given in a textfile.
After having preprocessed the images (coordinates for this database are given here) you end up with subsets for each person like this (function used to create the gallery of images):
The preprocessed dataset then has to be stored in a hierarchy similiar to this:
philipp@mango:~/python/facerec$ tree /home/philipp/facerec/data/yalefaces /home/philipp/facerec/data/yalefaces |-- s01 | |-- crop_subject01.centerlight | |-- crop_subject01.glasses | |-- crop_subject01.happy | [...] |-- s02 | |-- crop_subject02.centerlight | |-- crop_subject02.glasses | |-- crop_subject02.happy | [...] 15 directories, 165 files
I want to do face recognition on this dataset. The Eigenfaces method did such a good job on the AT&T Facedatabase, so how does it work for the Yalefaces? Let's find out and import the models, validation and filereader:
>>> from facerec.models import * >>> from facerec.validation import * >>> from facerec.filereader import *
Read in the Yale Facedatabase A:
>>> dataset = FilesystemReader("/home/philipp/facerec/data/yalefaces")
Create the Eigenfaces model, we'll take the 50 principal components in this example:
>>> eigenface = Eigenfaces(num_components=50) # take 50 principal components
We can measure the performance of this classifier with a Leave-One Out Cross validation:
>>> cv0 = LeaveOneOutCrossValidation(eigenface) >>> cv0.validate(dataset.data,dataset.classes,print_debug=True) >>> cv0 Leave-One-Out Cross Validation (model=Eigenfaces, runs=1, accuracy=81.82%, tp=135, fp=30, tn=0, fn=0)
So the model has a recognition rate of 81.82%. Note that a Leave-One-Out Cross validation can be computationally expensive for large datasets. In this example we are computing 165 models, with a computation time of 3 seconds per model this validation will take circa 8 minutes to finish. It's often more useful to perform a k-fold cross validation instead, because a Leave One Out Cross validation can also act weird at times (see the slides at Introduction to Pattern Analysis: Cross Validation). A 5-fold cross validation is good for this database:
>>> cv1= KFoldCrossValidation(eigenfaces, k=5) # perform a 5-fold cv >>> for i in xrange(10): ... cv1.validate(dataset.data,dataset.classes, print_debug=True)
The model will be validated 50 times in total, so the loop will take 2.5 minutes to finish. If you want to log the validation results per fold (so you can get the standard deviation of a single k-fold cross-validation run for example) add results_per_fold=True when initializing the class (be careful, the figures may be misleading for very small folds).
Some coffees later the validation has finished. Before reporting the final figure I'd like to show you some methods common to all validation objects:
>>> cv1.results array([[123, 27, 0, 0], [123, 27, 0, 0], [121, 29, 0, 0], [119, 31, 0, 0], [120, 30, 0, 0], [121, 29, 0, 0], [122, 28, 0, 0], [124, 26, 0, 0], [120, 30, 0, 0], [121, 29, 0, 0]]) >>> cv1.tp array([123, 123, 121, 119, 120, 121, 122, 124, 120, 121]) >>> cv1.accuracy 0.80933333333333335 >>> cv1.std_accuracy 0.0099777436361914683
And to get an overview you can simply get the representation of the Validation object:
>>> cv1 # or simply get the representation k-Fold Cross Validation (model=Eigenfaces, k=5, runs=10, accuracy=80.93%, std(accuracy)=1.00%, tp=1214, fp=286, tn=0, fn=0)
With 80.93%+-1.00% accuracy the Eigenfaces algorithms performs OK on this dataset, I wouldn't say it performs bad. But can we do better, by taking less or more principal components? We can write a small script to evaluate that. I will perform a 5-fold cross-validation on the Eigenfaces model with [10, 25, 40, 55, 70, 85, 100] components:
parameter = range(10,101,15) results = [] for i in parameter: cv = KFoldCrossValidation(Eigenfaces(num_components=i),k=5) cv.validate(dataset.data,dataset.classes) results.append(cv)
And we can see, that it doesn't really better the recognition rate:
>>> for result in results: ... print "%d components: %.2f%%+-%.2f%%" % (result.model.num_components, result.accuracy*100, result.std_accuracy*100) ... 10 components: 73.33%+-0.00% 25 components: 80.00%+-0.00% 40 components: 80.67%+-0.00% 55 components: 81.33%+-0.00% 70 components: 82.67%+-0.00% 85 components: 81.33%+-0.00% 100 components: 82.00%+-0.00%
Maybe we can get a better understanding of the method if we look at the Eigenfaces. Computing the model is necessary, because in the validation the model is cleared.
>>> eigenfaces = Eigenfaces(dataset.data,dataset.classes,num_components=100) # model will now be computed
and then plot the Eigenfaces:
>>> from facerec.visual import * >>> PlotSubspace.plot_weight(model=eigenfaces, num_components=16, dataset=dataset, title="Eigenfaces", rows=4, cols=4, filename="16_eigenfaces.png")
Almost everywhere you see grayscaled Eigenfaces, which are really hard to explain. With a colored heatmap instead you can easily see the distribution of grayscale intensities, where 0 codes a dark blue and 255 is a dark red:
Just by looking at the first 16 Eigenfaces we can see, that they don't really describe facial features to distinguish between persons, but find components to reconstruct from. The fourth Eigenface for example describes the left-light, while the principal component 6 describes the right-light. This makes sense if you think of the Principal Component Analysis not as a classification method, but more as a compression method.
Let's see what I mean. By computing the Eigenfaces, we have projected the database onto the first hundred components as identified by a Principal Component Analysis. This effectively reduced the dimensionality of the images to 100. We can now try to reconstruct the real faces from these 100 components:
>>> proj1 = eigenfaces.P[:,17].copy() >>> I = eigenfaces.reconstruct(proj1)
… and plot it:
>>> PlotBasic.plot_grayscale(I, "eigenface_reconstructed_face.png", (dataset.width, dataset.height))
… and we should see a face again:
Time for the Fisherfaces method! If you haven't done already, import the modules and load the data:
>>> from facerec.models import * >>> from facerec.validation import * >>> from facerec.filereader import *
Read the dataset:
>>> dataset = FilesystemReader("/home/philipp/facerec/data/yalefaces")
create the model (default parameters are ok):
>>> fisherfaces = Fisherfaces()
And then perform a Leave-One-Out Cross-Validation:
>>> cv2 = LeaveOneOutCrossValidation(fisherfaces) >>> cv2.validate(dataset.data,dataset.classes, print_debug=True) >>> cv2 Leave-One-Out Cross Validation (runs=1, accuracy=96.36%, tp=159, fp=6, tn=0, fn=0)
… which yields an accuracy of 96.36%. Performing 10 runs of a 5-fold cross validation supports this:
>>> cv3 = KFoldCrossValidation(fisherfaces, k=5) # perform a 5-fold cv >>> for i in xrange(10): ... cv3.validate(X,y,print_debug=True) >>> cv3 k-Fold Cross Validation (k=5, runs=10, accuracy=96.80%, std(accuracy)=1.63%, tp=1452, fp=48, tn=0, fn=0, model=Fisherfaces (num_components=14))
Allthough the standard deviation is slightly higher for the Fisherfaces, with 96.80%+-1.63% it performs much better than the Eigenfaces method. You can also see that the faces were reduced to only 14 components (equals number of subjects - 1). Let's have a look at the components identified by the Fisherfaces method:
>>> fisherface = Fisherfaces(dataset.data,dataset.classes) # initialize and compute, or call compute explicitly
Import the visual module and plot the faces:
>>> from facerec.visual import * >>> PlotSubspace.plot_weight(model=fisherface, num_components=14, dataset=dataset, filename="16_fisherfaces.png", rows=4, cols=4)
The Fisherfaces are a bit harder to explain, because they identify regions of a face that separate faces best from each other. None of them seems to encode particular light settings, at least it's not as obvious as in the Eigenfaces method:
I could only guess which component describes which features. So I leave the interpretation up to the reader. What we lose with the Fisherfaces method for sure, is the ability to reconstruct faces. If I want to reconstruct face number 17, just like in the Eigenfaces section:
>>> proj2 = fisherface.P[:,17].copy() >>> I2 = fisherface.reconstruct(proj2)
… and plot it:
>>> PlotBasic.plot_grayscale(I2, "fisherface_reconstructed_face.png", (dataset.width, dataset.height))
On this dataset the Fisherface method performed much better, than the Eigenfaces method did. While it yields a much better recognition performance, the ability to reconstruct faces from the Eigenspace is lost. So in conclusion, I think both methods complement each other. It would be a good idea to keep the PCA object around (already computed in the Fisherfaces method) in order to use its reconstruction capabilities.
Some years ago a lot of people told me, that I am looking just like Brad Pitt. Or did I tell it everyone after seeing Troy for the fifth time? Whatever – sadly times have changed. I am losing my hair and now I am exponentially progressing towards a Sir Patrick Stewart or a bald Britney Spears. Now armed with the algorithms we've discussed, I can finally clear the confusion and find out which celebrity I resemble the most! This last experiment turned into a weekend project itself, but I had a lot of fun writing it and I hope you have fun reading it.
My celebrity dataset consists of the following faces:
I have chosen the images to be frontal face with a similiar perspective and light settings. Since the algorithms are sensible towards rotation and scale, the images have been preprocessed to equal scale of 70×70 pixels and centered eyes. Here is my Clooney-Set for example:
I hope you understand that I can't share the dataset with you. Some of the photos have a public license, but most of the photos have an unclear license. From my experience all I can tell you is: finding images of George Clooney took less than a minute, but it was really, really hard finding good images of Johnny Depp or Arnold Schwarzenegger.
Let's see how both methods perform on the celebrities dataset.
Import the modules:
>>> from facerec.models import * >>> from facerec.validation import * >>> from facerec.filereader import * >>> from facerec.visual import *
read in the database:
>>> dataset = FilesystemReader("/home/philipp/facerec/data/c1")
The Eigenfaces method with a Leave-One-Out cross validation:
>>> eigenface = Eigenfaces(num_components=100) >>> cv0 = LeaveOneOutCrossValidation(eigenface) >>> cv0.validate(dataset.data,dataset.classes,print_debug=True) >>> cv0 Leave-One-Out Cross Validation (model=Eigenfaces, runs=1, accuracy=44.00%, tp=44, fp=56, tn=0, fn=0)
Only achieves 44% recognition rate. A 5-fold cross validation:
>>> cv1 = KFoldCrossValidation(eigenface, k=5) >>> for i in xrange(10): ... cv1.validate(dataset.data,dataset.classes,print_debug=True) >>> cv1 k-Fold Cross Validation (model=Eigenfaces, k=5, runs=10, accuracy=43.30%, std(accuracy)=2.33%, tp=433, fp=567, tn=0, fn=0)
Shows that the recognition rate of 43.30%+-2.33% is a bit better than guessing, still it's too bad to decide between celebrities.
The Fisherface method instead:
>>> fisherface = Fisherfaces()
… achieves a 93% recognition rate with a leave one out strategy:
>>> cv2 = LeaveOneOutCrossValidation(fisherface) >>> cv2.validate(dataset.data,dataset.classes, print_debug=True) Leave-One-Out Cross Validation (model=Fisherfaces, runs=1, accuracy=93.00%, tp=93, fp=7, tn=0, fn=0)
And with 5-fold cross validation:
>>> cv3 = KFoldCrossValidation(fisherface,k=5) >>> for i in xrange(10): ... cv3.validate(dataset.data,dataset.classes,print_debug=True) >>> cv3 k-Fold Cross Validation (model=Fisherfaces (num_components=9), k=5, runs=10, accuracy=90.50%, std(accuracy)=1.36%, tp=905, fp=95, tn=0, fn=0)
… it has a recognition rate of 90.50%+-1.36%. It's a good classificator to decide between our celebrities. Now let's have a look at the components found by both methods!
Compute both models (necessary because the validation empties the model):
>>> eigenface = Eigenfaces(dataset.data,dataset.classes,num_components=100) >>> fisherface = Fisherfaces(dataset.data,dataset.classes)
The Eigenfaces…
>>> PlotSubspace.plot_weight(model=eigenface, num_components=16, dataset=dataset, title="Celebrity Eigenface", rows=4, cols=4, filename="16_celebrity_eigenfaces.png")
… again encode light in the images, see the second Eigenface for example:
While the Fisherfaces identify regions:
>>> PlotSubspace.plot_weight(model=fisherface, num_components=9, dataset=dataset, title="Celebrity Fisherface", rows=3, cols=3, filename="9_celebrity_fisherfaces.png")
… where you can decide between the persons. The first component for example describes the eyes, while other components describe mouth-ness or eye-browness of a face:
By using two and three components, we can see how the faces are distributed (using GNU Octave, see example.m). I'll use just one example per class for a better overview:
octave:1> addpath (genpath ("."));
octave:2> [X y width height names] = read_images("/home/philipp/facerec/data/c1");
octave:3> fisherface = fisherfaces(X,y);
octave:4> figure; hold on
octave:5> for i = [1:10]
> C = findclasses(fisherface.y, i)
> text(fisherface.P(1,C(1)), fisherface.P(2,C(1)), num2str(i));
> endfor
octave:6> print -dpng "-S640,480" celebrity_facespace.png
octave:7> names
names =
{
[1,1] = crop_angelina_jolie
[1,2] = crop_arnold_schwarzenegger
[1,3] = crop_brad_pitt
[1,4] = crop_george_clooney
[1,5] = crop_johnny_depp
[1,6] = crop_justin_timberlake
[1,7] = crop_katy_perry
[1,8] = crop_keanu_reeves
[1,9] = crop_patrick_stewart
[1,10] = crop_tom_cruise
}
Brad Pitt (3), Johnny Depp (5) and Tom Cruise (10) seem to have a unique face, as they are far away from all other faces. Surprisingly George Clooney (4) and Justin Timberlake (6) are near to each other. Arnold Schwarzenegger (2), Katy Perry (7), Keanu Reeves (8) and Patrick Stewart (9) are also close to each other. Angelina Jolie (1) is right between those two clusters; with George Clooney (4) and Keanu Reeves (8) as closest match:
Now let's finally answer the question why I wrote this post at all. Who do I resemble the most? I'll use a frontal photo of myself and crop it, just like the other faces:
Then I compute the model:
>>> fisherface = Fisherfaces(dataset.data,dataset.classes)
… read in my face:
>>> Q = FilesystemReader.read_image("/home/philipp/crop_me.png")
… and find the closest match:
>>> prediction = fisherface.predict(Q) >>> print dataset.className(prediction) ... crop_patrick_stewart
I knew it. The closest match to my face is Patrick Stewart. Taking 5 nearest neighbors, doesn't change a thing… it's still Patrick. Damn, I knew including him would change everything – and I am not even close to Brad Pitt!
If I project my face on the first 2 components I would actually resemble Arnie the most:
but including the third dimension (projection on second and third component) brings me closer to Patrick Stewart:
Just because the Eigenfaces algorithm can't classify the persons doesn't mean we can't use it here aswell. It did a great job at reconstructing a projected vector, so if I project my face into the facespace… and reconstruct it… my face should be the mixture of those faces.
Let's see what I look like:
>>> eigenface = Eigenfaces(num_components=100) >>> eigenface.compute(dataset.data,dataset.classes) >>> proj = eigenface.project(Q) >>> rec = eigenface.reconstruct(proj) >>> PlotBasic.greyscale(rec, "philipp_celebrity.png")
Sorry it's a bit small, because my training images are only 70×70 pixels:
Definitely some Schwarzenegger in there!
The GNU Octave version is just as simple to use as the Python version. I think there's no need to explain every function argument in detail, the code is short and it's all commented in the definition files already. Let's go and fire up Octave. But before you start to do anything, you have to add the path of definition files to the GNU Octave search path. Since GNU Octave 3 you can use addpath(genpath(”/path/to/folder/”)) to add all definition files in a folder and subfolder:
addpath (genpath ("."));
This makes all function definition in the subfolders available to GNU Octave (see example.m) and now you can read in the dataset:
% load data
[X y width height names] = read_images("/home/philipp/facerec/data/yalefaces_recognition");
from this we can learn the Eigenfaces with 100 principal components:
eigenface = eigenfaces(X,y,100);
If we want to see the classes in 2D:
%% 2D plot of projection (add the classes you want)
figure; hold on;
for i = findclasses(eigenface.y, [1,2,3])
text(eigenface.P(1,i), eigenface.P(2,i), num2str(eigenface.y(i)));
endfor
I adopted the KISS principle for the validation. The key idea is the following: the validation gets two function handles fun_train, which learns a model and fun_predict, which tests the learned model. Inside the validation a data array and corresponding classes are passed to the training function; and fun_predict gets a model and test instance. What to do if you need to add parameters to one of the calls? Bind them! I want to learn eigenfaces with 100 components and the prediction should use a 1 nearest neighbor for the classification:
fun_eigenface = @(X,y) eigenfaces(X,y,100); fun_predict = @(model, Xtest) eigenfaces_predict(model, Xtest, 1)
Easy huh? Performing validations can now be done with:
% perform a Leave-One-Out Cross Validation cv0 = LeaveOneOutCV(X,y,fun_eigenface, fun_predict, 1) % perform a 10-fold cross validation cv1 = KFoldCV(X,y,10,fun_eigenface, fun_predict,1) % perform a 3-fold cross validation cv2 = KFoldCV(X,y,3,fun_eigenface, fun_predict,1)
From these results you can gather the same statistics I have used in the Python version. I hope you acknowledge that this version is just as easy to use as the Python version. There are some additional parameters for the functions, so please read the function definition files or the example.m to understand those. For further questions, simply comment below.
In this post I have presented you a simple framework to evaluate your face recognition algorithms with Python and GNU Octave. The Python version will see some updates in the future and its usage might change a little, because some things are too complicated right now. I don't know if maintaining the GNU Octave version pays off, because I'd like to use it for prototyping algorithms. And after all, you have been answered one of most important questions in universe: my face resembles Patrick Stewart.
There are a few lessons I've learnt in this project and I'd like to share. If you are working with large datasets like with hundreds of face images, be sure to always choose the appropriate type for your data: a grayscale image for example only takes values in a range of [0,255] and fits into 8 unsigned bit (2^8-1 = 255). It would be a horrible waste of memory to take floats, which is the default NumPy datatype for arrays and matrices. And this applies to every other dataset as well. If you know which values your inputs can take: choose the appropriate type.
One thing you have to get used to when starting with NumPy is working with its arrays and matrices. In NumPy it's a rule of thumb to take NumPy arrays for everything, so if you use one of NumPys built-in functions you can be pretty sure that an array is returned. And there are certain differences to a NumPy matrix you should know about. I've run into the following problem: I wanted to subtract the column mean from each column vector of an array A. My first version was like this:
>>> A = np.array([[2,2],[2,7]]) >>> A - A.mean(axis=1) array([[ 0. , -2.5], [ 0. , 2.5]]) >>> A.mean(axis=1) array([ 2. , 4.5])
Which is obvliously wrong and it is because A.mean(axis=1) returns a row instead of a column, as you can see in the listing. You can avoid that with:
>>> A - A.mean(axis=1).reshape(-1,1) array([[ 0. , 0. ], [-2.5, 2.5]]) >>> # or you set the metadata information to a matrix >>> A = np.asmatrix(A) >>> A - A.mean(axis=1) matrix([[ 0. , 0. ], [-2.5, 2.5]])
So if you want to treat an array like a matrix, then simply call numpy.asmatrix at some point in your code. Next thing you'll need to understand is related to array slicing and indexing, which is a great thing in NumPy. If you want to just slice the first column off an array, be sure to always make a copy of the sliced data instead of just assigning a variable to the slice. This applies if you don't go for speed, the situation might change if you are looping through data a million times. See this little example. Let's start the interactive python shell and import numpy:
philipp@mango:~$ python >>> import numpy as np
With pmap we can now see how much memory the python process currently consumes:
philipp@mango:~$ ps ax | grep python 27241 pts/7 S+ 0:00 python philipp@mango:~$ pmap -x 27241 [...] b783c000 0 524 524 rw--- [ anon ] b78d7000 0 8 8 rw--- [ anon ] bf979000 0 172 172 rw--- [ stack ] -------- ------- ------- ------- ------- total kB 26436 - - -
26 MB is a pretty good value (rounded up). Now let's create a huge NumPy matrix with 9000*9000*64 bit:
>>> huge = np.zeros((9000,9000),dtype=np.float64)
The process should now occupy 26 MB + 618 MB, making it 644 MB in total. pmap verifies:
philipp@mango:~$ pmap -x 27241 [...] b783c000 0 524 524 rw--- [ anon ] b78d7000 0 8 8 rw--- [ anon ] bf979000 0 172 172 rw--- [ stack ] -------- ------- ------- ------- ------- total kB 659252 - - -
The process now has 644 MB. What happens if you attempt to slice the array? Take off the first column and assign it to huge again:
>>> huge = huge[:,0] >>> huge.shape (9000,)
Wonderful, and the memory?
philipp@mango:~$ pmap -x 27241 [...] b783c000 0 524 524 rw--- [ anon ] b78d7000 0 8 8 rw--- [ anon ] bf979000 0 172 172 rw--- [ stack ] -------- ------- ------- ------- ------- total kB 659252 - - -
Still 644 MB! Why? Because NumPy on one hand stores your raw array data (called a data buffer in the NumPy internals document) and on the other hand stores information about the raw data. If you slice a column and assign it to a variable, you basically just create a new metadata object (with the specific information about shape etc.), but it's still a view on the same data buffer. So all your data still resides in memory. This makes slicing and indexing of NumPy arrays and matrices superfast, but it's something you should definitely know about.
If speed is not crucial to you and you care about your bytes, be sure to make a copy:
>>> huge = huge[0:,0].copy() # version (1) >>> huge = np.array(huge[0:,0], copy=True) # or version (2)
… and the memory magically shrinks to 26 MB again.:
philipp@mango:~$ pmap -x 27241 [...] b783c000 0 524 524 rw--- [ anon ] b78d7000 0 8 8 rw--- [ anon ] bf979000 0 172 172 rw--- [ stack ] -------- ------- ------- ------- ------- total kB 26436 - - -
I didn't go for OOP in GNU Octave, because even a simple example like myclass.m:
function b = myclass(a) b.a = a; b = class (b, "myclass"); endfunction
fails with GNU Octave, version 3.2.4:
$ octave octave:0> x = myclass(1) error: class: invalid call from outside class constructor error: called from: error: myclass at line 3, column 5
From what I have read the error seems to be a regression bug and it's probably fixed in the latest stable release. However, the implementation of OOP in GNU Octave still seems to be experimental and I don't think everybody is on latest releases. And to be honest, I've seldomly (read almost never) seen object oriented code with either MATLAB or Octave, so I don't really feel guilty about not using it.