«

»

Aug 25

A simple object classifier with Bag-of-Words using OpenCV 2.3 [w/ code]


Just wanted to share of some code I've been writing.
So I wanted to create a food classifier, for a cool project down in the Media Lab called FoodCam. It's basically a camera that people put free food under, and they can send an email alert to the entire building to come eat (by pushing a huge button marked "Dinner Bell"). Really a cool thing.

OK let's get down to business.

I followed a very simple technique described in this paper. I know, you say, "A Paper? Really? I'm not gonna read that technical boring stuff, give the bottom line! man.. geez." Well, you are right, except that this paper IS the bottom line, it's dead simple. It's almost a tutorial. It is also referenced by the OpenCV documentation.

Edit (6/5/2014): Another great read for selecting the best color-space and invariant features is this paper by van de Sande et al.

The method is simple:
- Extract features of choice from training set that contains all classes.
- Create a vocabulary of features by clustering the features (kNN, etc). Let's say 1000 features long.
- Train your classifiers (SVMs, Naive-Bayes, boosting, etc) on training set again (preferably a different one), this time check the features in the image for their closest clusters in the vocabulary. Create a histogram of responses for each image to words in the vocabulary, it will be a 1000-entries long vector. Create a sample-label dataset for the training.
- When you get an image you havn't seen - run the classifier and it should, god willing, give you the right class.

Turns out, those crafty guys in WillowGarage have done pretty much all the heavy lifting, so it's up for us to pick the fruit of their hard work. OpenCV 2.3 comes packed with a set of classes, whose names start with BOW for Bag Of Words, that help a lot with implementing this method.

Starting with the first step:

Mat training_descriptors(1,extractor->descriptorSize(),extractor->descriptorType());

SurfFeatureDetector detector(400);
vector keypoints;

// computing descriptors
Ptr extractor(
   new OpponentColorDescriptorExtractor(
      Ptr(new SurfDescriptorExtractor())
   )
);

while(..loop a directory? a file?..) {
   Mat img = imread(filepath);
   detector.detect(img, keypoints);
   extractor->compute(img, keypoints, descriptors);
   training_descriptors.push_back(descriptors);
}

Simple!
Let's go create a vocabulary then. Luckily, OpenCV has taken care of that, and provide a simple API:

BOWKMeansTrainer bowtrainer(1000); //num clusters
bowtrainer.add(training_descriptors);
Mat vocabulary = bowtrainer.cluster();

Boom. Vocabulary.
Now, let's train us some SVM classifiers!
We're gonna train a 2-class SVM, in a 1-vs-all kind of way. Meaning we train an SVM that can say "yes" or "no" when choosing between one class and the rest of the classes, hence 1-vs-all.
But first, we need to scour the training set for our histograms (the responses to the vocabulary, remember?):

vector<KeyPoint> keypoints;
Mat response_hist;
Mat img;
string filepath;
map<string,Mat> classes_training_data;

Ptr<FeatureDetector > detector(new SurfFeatureDetector());
Ptr<DescriptorMatcher > matcher(new BruteForceMatcher<L2<float> >());
Ptr<DescriptorExtractor > extractor(new OpponentColorDescriptorExtractor(Ptr<DescriptorExtractor>(new SurfDescriptorExtractor())));
Ptr<BOWImgDescriptorExtractor> bowide(new BOWImgDescriptorExtractor(extractor,matcher));
bowide->setVocabulary(vocabulary);

#pragma omp parallel for schedule(dynamic,3)
for(..loop a directory?..) {
   img = imread(filepath);
   detector->detect(img,keypoints);
   bowide.compute(img, keypoints, response_hist);

   #pragma omp critical
   {
      if(classes_training_data.count(class_) == 0) { //not yet created...
         classes_training_data[class_].create(0,response_hist.cols,response_hist.type());
         classes_names.push_back(class_);
      }
      classes_training_data[class_].push_back(response_hist);
   }
   total_samples++;
}

Now, two things:
First notice I'm keeping the training data for each class separately, this is because we will need this for later creating the 1-vs-all samples-labels matrices.
Second, I use OpenMP multi(-threading)processing to make the calculation parallel, and hence faster, on multi-core machines (like the one I used). Time is sliced by a whole lot. OpenMP is a gem, use it more. Just a couple of #pragma directives and you're multi-threading.

Alright, data gotten, let's get training:

#pragma omp parallel for schedule(dynamic)
for (int i=0;i<classes_names.size();i++) {
   string class_ = classes_names[i];
   cout << omp_get_thread_num() << " training class: " << class_ << ".." << endl;
		
   Mat samples(0,response_cols,response_type);
   Mat labels(0,1,CV_32FC1);
		
   //copy class samples and label
   cout << "adding " << classes_training_data[class_].rows << " positive" << endl;
   samples.push_back(classes_training_data[class_]);
   Mat class_label = Mat::ones(classes_training_data[class_].rows, 1, CV_32FC1);
   labels.push_back(class_label);
		
   //copy rest samples and label
   for (map<string,Mat>::iterator it1 = classes_training_data.begin(); it1 != classes_training_data.end(); ++it1) {
      string not_class_ = (*it1).first;
      if(not_class_.compare(class_)==0) continue; //skip class itself
      samples.push_back(classes_training_data[not_class_]);
      class_label = Mat::zeros(classes_training_data[not_class_].rows, 1, CV_32FC1);
      labels.push_back(class_label);
   }
   
   cout << "Train.." << endl;
   Mat samples_32f; samples.convertTo(samples_32f, CV_32F);
   if(samples.rows == 0) continue; //phantom class?!
   CvSVM classifier; 
   classifier.train(samples_32f,labels);

   //do something with the classifier, like saving it to file
}

Again, I parallelize, although the process is not too slow.
Note how I build the samples and the labels, where each time I put in the positive samples and mark the labels '1', and then I put the rest of the samples and label them '0'.

Moving on to .... testing the classifiers!
Nothing seems to me like more fun than creating a confusion matrix! Not really, but let's see how it's done:

map<string,map<string,int> > confusion_matrix; // confusionMatrix[classA][classB] = number_of_times_A_voted_for_B;
map<string,CvSVM> classes_classifiers; //This we created earlier

vector<string> files; //load up with images
vector<string> classes; //load up with the respective classes

for(..loop over a directory?..) {
   Mat img = imread(files[i]),resposne_hist;
   
   vector<KeyPoint> keypoints;
   detector->detect(img,keypoints);
   bowide->compute(img, keypoints, response_hist);

   float minf = FLT_MAX; string minclass;
   for (map<string,CvSVM>::iterator it = classes_classifiers.begin(); it != classes_classifiers.end(); ++it) {
      float res = (*it).second.predict(response_hist,true);
      if (res < minf) {
         minf = res;
         minclass = (*it).first;
      }
   }
   confusion_matrix[minclass][classes[i]]++;  
}

When you take a look in my files, you will find a much complicated way of doing this. But this is the core idea - look in the image for the response histogram to the vocabulary of features (rather, feature-cluster-ceneters), run it by all the classifiers and take the one with the best score. Simple.
Consider making this parallel as well. No reason for it to be serial.

That's about covers it.

Code

Lately I'm pushing stuff in Github.com using git rather than SVN on googlecode. Donno why, it's just like that.
Get the whole thing at:
https://github.com/royshil/FoodcamClassifier

Follow the build instructions, they're a breeze, and then follow the runnning instructions. It's basically a series of command-line programs you run to get through each step, and in the end you have like a "predictor" service that takes an image and produces a prediction.

Edit (6/5/2014): The dataset can be downloaded from: http://www.media.mit.edu/~roys/shared/foodcamimages.zip

OK guys, have fun classifying stuff!
Roy.

Share
  • ismail

    Hey Roy,

    I wanted to ask what if we only have two classes? We only train one One Vs All model.

    Also and more importantly, when you compare to get the least response what if one of them is negative?What does that mean?

    Thank you

  • http://www.toprightpixel.com Chelsea

    Hi,

    Would this work on OpenCV 2.4.9? I tried running it but it failed to compile. I was wondering if I needed to have OpenCV 2.3 instead since files may have been changed around?

    The error I got was:
    make[2]: *** [CMakeFiles/foodcam-predict.dir/predict_common.cpp.o] Error 1
    make[1]: *** [CMakeFiles/foodcam-predict.dir/all] Error 2
    make: *** [all] Error 2

    Thanks.

  • http://www.morethantechnical.com Roy

    @Chelsea
    It probably needs changed to the API, as it must have changed.
    You could give it a try yourself and submit your code, I'll review and merge if it works...

  • lunamystry

    Hi Roy,

    Wonderful post, thank you so much!
    I am wondering though. I am looking at the code on github: https://github.com/royshil/FoodcamClassifier/blob/master/main.cpp

    Just the main file, the other files scare me (I have about two week worth of knowledge on CV and all this.)

    On line 105 where you are doing the assigning of the histogram to classes and 1-vs-many training, you compute descriptors for each of the train images. You don't seem to calculate new keypoints for the image you are computing descriptors for though. Is this how it is? won't the keypoints variable contain data from the last image you computed descriptors for when you were making the bag of keypoints?

  • Jumabek

    Hello, great job bro. Thank you for sharing and comprehensive explanation.
    But the problem is I am not in that level to understand this project.
    Nevertheless I really have to make it work for my assignment.
    When I tried following commands in the command line , i got an error.
    Can you help me to make it run. Thanks again!

    I run this in the command line

    D:\labworks\object recognition\FoodcamClassifier>cmake -D CMAKE_CXX_FLAGS=-fopen
    mp . ; make -j4
    CMake Error: The source directory "D:/labworks/object recognition/FoodcamClassif
    ier/-j4" does not exist.
    Specify --help for usage, or press the help button on the CMake GUI.

  • Jumabek

    Actually it was simple I just changed "j4" part to directory folder ;)

  • Pingback: Object detection with OpenCV SVM - DL-UAT()

  • Rajind

    @Mara did you find any solution to this ? I am facing the same problem.