Aug 25 2011

A simple object classifier with Bag-of-Words using OpenCV 2.3 [w/ code]

Published by at 5:34 am under code,opencv,programming,Recommended,Software,video,vision

Just wanted to share of some code I've been writing.
So I wanted to create a food classifier, for a cool project down in the Media Lab called FoodCam. It's basically a camera that people put free food under, and they can send an email alert to the entire building to come eat (by pushing a huge button marked "Dinner Bell"). Really a cool thing.

OK let's get down to business.

I followed a very simple technique described in this paper. I know, you say, "A Paper? Really? I'm not gonna read that technical boring stuff, give the bottom line! man.. geez." Well, you are right, except that this paper IS the bottom line, it's dead simple. It's almost a tutorial. It is also referenced by the OpenCV documentation.

The method is simple:
- Extract features of choice from training set that contains all classes.
- Create a vocabulary of features by clustering the features (kNN, etc). Let's say 1000 features long.
- Train your classifiers (SVMs, Naive-Bayes, boosting, etc) on training set again (preferably a different one), this time check the features in the image for their closest clusters in the vocabulary. Create a histogram of responses for each image to words in the vocabulary, it will be a 1000-entries long vector. Create a sample-label dataset for the training.
- When you get an image you havn't seen - run the classifier and it should, god willing, give you the right class.

Turns out, those crafty guys in WillowGarage have done pretty much all the heavy lifting, so it's up for us to pick the fruit of their hard work. OpenCV 2.3 comes packed with a set of classes, whose names start with BOW for Bag Of Words, that help a lot with implementing this method.

Starting with the first step:

Mat training_descriptors(1,extractor->descriptorSize(),extractor->descriptorType());

SurfFeatureDetector detector(400);
vector keypoints;

// computing descriptors
Ptr extractor(
   new OpponentColorDescriptorExtractor(
      Ptr(new SurfDescriptorExtractor())

while(..loop a directory? a file?..) {
   Mat img = imread(filepath);
   detector.detect(img, keypoints);
   extractor->compute(img, keypoints, descriptors);

Let's go create a vocabulary then. Luckily, OpenCV has taken care of that, and provide a simple API:

BOWKMeansTrainer bowtrainer(1000); //num clusters
Mat vocabulary = bowtrainer.cluster();

Boom. Vocabulary.
Now, let's train us some SVM classifiers!
We're gonna train a 2-class SVM, in a 1-vs-all kind of way. Meaning we train an SVM that can say "yes" or "no" when choosing between one class and the rest of the classes, hence 1-vs-all.
But first, we need to scour the training set for our histograms (the responses to the vocabulary, remember?):

vector<KeyPoint> keypoints;
Mat response_hist;
Mat img;
string filepath;
map<string,Mat> classes_training_data;

Ptr<FeatureDetector > detector(new SurfFeatureDetector());
Ptr<DescriptorMatcher > matcher(new BruteForceMatcher<L2<float> >());
Ptr<DescriptorExtractor > extractor(new OpponentColorDescriptorExtractor(Ptr<DescriptorExtractor>(new SurfDescriptorExtractor())));
Ptr<BOWImgDescriptorExtractor> bowide(new BOWImgDescriptorExtractor(extractor,matcher));

#pragma omp parallel for schedule(dynamic,3)
for(..loop a directory?..) {
   img = imread(filepath);
   bowide.compute(img, keypoints, response_hist);

   #pragma omp critical
      if(classes_training_data.count(class_) == 0) { //not yet created...

Now, two things:
First notice I'm keeping the training data for each class separately, this is because we will need this for later creating the 1-vs-all samples-labels matrices.
Second, I use OpenMP multi(-threading)processing to make the calculation parallel, and hence faster, on multi-core machines (like the one I used). Time is sliced by a whole lot. OpenMP is a gem, use it more. Just a couple of #pragma directives and you're multi-threading.

Alright, data gotten, let's get training:

#pragma omp parallel for schedule(dynamic)
for (int i=0;i<classes_names.size();i++) {
   string class_ = classes_names[i];
   cout << omp_get_thread_num() << " training class: " << class_ << ".." << endl;
   Mat samples(0,response_cols,response_type);
   Mat labels(0,1,CV_32FC1);
   //copy class samples and label
   cout << "adding " << classes_training_data[class_].rows << " positive" << endl;
   Mat class_label = Mat::ones(classes_training_data[class_].rows, 1, CV_32FC1);
   //copy rest samples and label
   for (map<string,Mat>::iterator it1 = classes_training_data.begin(); it1 != classes_training_data.end(); ++it1) {
      string not_class_ = (*it1).first;
      if( continue; //skip class itself
      class_label = Mat::zeros(classes_training_data[not_class_].rows, 1, CV_32FC1);
   cout << "Train.." << endl;
   Mat samples_32f; samples.convertTo(samples_32f, CV_32F);
   if(samples.rows == 0) continue; //phantom class?!
   CvSVM classifier; 

   //do something with the classifier, like saving it to file

Again, I parallelize, although the process is not too slow.
Note how I build the samples and the labels, where each time I put in the positive samples and mark the labels '1', and then I put the rest of the samples and label them '0'.

Moving on to .... testing the classifiers!
Nothing seems to me like more fun than creating a confusion matrix! Not really, but let's see how it's done:

map<string,map<string,int> > confusion_matrix; // confusionMatrix[classA][classB] = number_of_times_A_voted_for_B;
map<string,CvSVM> classes_classifiers; //This we created earlier

vector<string> files; //load up with images
vector<string> classes; //load up with the respective classes

for(..loop over a directory?..) {
   Mat img = imread(files[i]),resposne_hist;
   vector<KeyPoint> keypoints;
   bowide->compute(img, keypoints, response_hist);

   float minf = FLT_MAX; string minclass;
   for (map<string,CvSVM>::iterator it = classes_classifiers.begin(); it != classes_classifiers.end(); ++it) {
      float res = (*it).second.predict(response_hist,true);
      if (res < minf) {
         minf = res;
         minclass = (*it).first;

When you take a look in my files, you will find a much complicated way of doing this. But this is the core idea - look in the image for the response histogram to the vocabulary of features (rather, feature-cluster-ceneters), run it by all the classifiers and take the one with the best score. Simple.
Consider making this parallel as well. No reason for it to be serial.

That's about covers it.


Lately I'm pushing stuff in using git rather than SVN on googlecode. Donno why, it's just like that.
Get the whole thing at:

Follow the build instructions, they're a breeze, and then follow the runnning instructions. It's basically a series of command-line programs you run to get through each step, and in the end you have like a "predictor" service that takes an image and produces a prediction.

OK guys, have fun classifying stuff!


47 responses so far

47 Responses to “A simple object classifier with Bag-of-Words using OpenCV 2.3 [w/ code]”

  1. strikermanon 07 Sep 2011 at 11:35 am

    hi i have error opencv2/opencv.hpp not found. how can i resolve. please help me.

  2. Royon 07 Sep 2011 at 5:50 pm

    Make sure OpenCV is installed, and that CMake can find it
    It may take some time to set things up
    follow the explanation on the official website:

  3. usama yaseenon 12 Sep 2011 at 5:43 pm

    Thanks for the nice tutorials + source code (tutorials without source code are often not affected :P), they really helped a lot :)
    i have few questions,

    i. how u generate training.txt?

    ii. training.txt contains the name of the image,rectangle(containing the region of interest i.e the food sample),and what is the last thing for example below is an excerpt from the training.txt:
    foodcamimages/TRAIN/20080311002618.jpg 182,182,405,171 50
    20080311002618.jpg: is the image,
    182,182,405,171: is the rectangle containing some food
    50: what is 50 and how to obtain it?

    thanks in advance


  4. strikermanon 13 Sep 2011 at 11:59 am

    thanks u for reply. if u can, can u send me visual studio .sln file.
    please help me. then did u have code intel oasis project? please post. u r great.

  5. Royon 13 Sep 2011 at 2:06 pm

    training.txt can be generated using the "manual-classifier" tool. you just give it a place where it will find images...

    re the mysterious '50', this is the class-name for that food in the rectangle. it can be anything you want.

    use CMake to get the .sln file
    run CMake with the source directory, and ask for a Visual Studio 20XX project

  6. Luckon 14 Sep 2011 at 4:49 pm

    Thanks for a great tutorial.
    I have another question. After calculate the res, We can say there is an object,say pizza, in the image. But next thing is how to locate pizza in the image by a frame?
    thank you

  7. Royon 14 Sep 2011 at 7:42 pm

    Yes, the idea is to go for image segmentation & categorization in the same run.
    Basically how I've done this (and you should go through the code in predict_common.cpp, in the FoodcamPredictor::evaluateOneImage function) I have used a "sliding window" approach to scan the parts in the image and label each one.
    From a dense scanning of the image like that, you can create a high-level segmentation, maybe using Graph-Cuts?

  8. Luckon 14 Sep 2011 at 11:45 pm

    Oh. So I guess, the wiser approach will be using some segmentation algorithms to divide the tested image into several areas. And for each area in the image, we apply the BOW algorithm to detect the object and draw a box around the sub-area in which key points are most crowded.
    Further, I just wonder beside BOW framework, are there any other distinct algorithms which have better performance for Object recognition?

  9. Mikeon 12 Nov 2011 at 6:04 pm

    very cool example!

  10. zwfsguon 25 Nov 2011 at 5:41 am

    in build_vocabolary.cpp
    Rect clipping_rect = Rect(0,120,640,480-120);
    img = img(clipping_rect);

    the img size often less then clipping_rect

  11. Royon 01 Dec 2011 at 12:50 am

    In that case what I usually do is "clip the clip": (a nice feature in OpenCV 2.3+ for Rect structs)

    Rect clipping_rect = Rect(...);
    Rect img_rect = Rect(0,0,img.cols,img.rows);
    clipping_rect = clipping_rect & img_rect;
    img = img(clipping_rect); // <---- now this will never fail. fingers crossed.

  12. Girl Rockon 05 Dec 2011 at 1:03 am

    What's the point of doing ./make-test-background and produce background.png. Isn't to get more accurate Keypoints? I don't think it is much affected even not utilizing background.png

  13. Girl Rockon 06 Dec 2011 at 8:20 am

    Hi, why I don't get the proper confusion matrix as you do? weird.. this is the output.. I'm using ur files train.txt and test.txt..
    49 -> 49:0
    50 -> 49:0
    51 -> 49:0
    52 -> 49:0
    53 -> 49:0
    54 -> 49:0
    55 -> 49:0
    56 -> 49:0
    57 -> 49:0
    97 -> 49:0

  14. Lizardson 21 Feb 2012 at 10:14 pm

    The dataset in foodcamimages/TRAIN etc. are missing from the

    I am not sure if an unknown password for to recover the image from the IMAP gmail account is really research friendly.

    Perhaps it would be nice if somebody shared the image dataset in another structured repository similar to LabelMe.

  15. jasmineon 13 Mar 2012 at 4:08 am

    Cmake didn't work ! would you please upload your project with the sln file :$ , I'll be so thankful I wanna see the execution of this great application , we're working in the same zone right now .
    my regards

  16. Royon 13 Mar 2012 at 4:49 am

    CMake, in the long run, will be much much better than using the sln!
    I suggest you give it another try
    What are the errors you encounter?

  17. jasmineon 13 Mar 2012 at 4:00 pm

    it's not woking at all!
    CMake Error: Unable to open cache file for save. C:/Program Files/CMake 2.8/CMakeCache.txt
    CMake Error: The source directory "C:/Users/Safaa/Documents/Downloads/Compressed/royshil-FoodcamClassifier-4ba20bb" does not appear to contain CMakeLists.txt.
    Specify --help for usage, or press the help button on the CMake GUI.
    CMake Error: : System Error: No such file or directory

  18. jasmineon 13 Mar 2012 at 4:56 pm

    I tried to make a new project but I'm getting 1454 error !
    can you give me a hand ?

  19. Royon 13 Mar 2012 at 8:47 pm

    I can't understand the problem from this error...
    If you have CMake installed, go to the directory via command line and run "cmake ." and see where that takes you.

  20. jasmineon 14 Mar 2012 at 1:55 am

    Cmake is not working , that's all

  21. jasmineon 14 Mar 2012 at 2:31 am

    well thank you so much , I just wanted to see your program , seems you can't upload it for me .

  22. Royon 14 Mar 2012 at 6:35 am

    I am happy to help you get it to compile, but you must be ready to do the work yourself.
    I assume you are running windows. Try to run the CMake GUI program and direct the "Source directory" to the project directory.
    Try to "Configure" and you will get specific errors about problems CMake encountered. Or "Generate" will be available right away and you can build the project using MSVS.

  23. Tonyon 16 Mar 2012 at 3:14 pm

    really it's a nice work
    i have some question :$$
    1- your files : every .cpp & .h have a main method ... so every couple give subproject ???
    2- you said that i have to run Cmake to have .sln ... but it's not working :((
    it gives alot of errors :(((
    it ask for the file that contains the source ... i give it the directory of the food file....! that right ??

  24. jasmineon 16 Mar 2012 at 5:47 pm

    I swear to god I did what you said but seems the cmake has problems itself . so I made a new project with those classes and headers, and I'm getting 100 unbelievable errors like
    " 89 IntelliSense: variable "CV_INLINE" is not a type name c:\opencv2\core\core_c.h 182 1
    in spit of I gave the VS 2010 all the paths of the include folders and the libs of the opencv !!
    it's driving me crazy :(

  25. Royon 16 Mar 2012 at 9:22 pm

    The project is built using CMake, which will create a subproject for every executable.
    Please use CMake.

    To compile and use the project you must have a working version of OpenCV 2.3+.
    Here are instructions for using OpenCV in applications:

    The problems you encounter are not problems in my code but in your environment. Sort that out and you are guaranteed to be able to run the programs.

  26. jasmineon 27 Mar 2012 at 3:39 pm

    back again
    I've been trying to build the sln since the time I had commented here !
    I asked many people and no one helped me and i tried to run the cmake on Linux but I'm not familiar with it that much
    would you please uploaded for me after running the cmake on the files?
    I really need to see the execution of your project and until now I can't even run it ! :(
    thank u so much

  27. LOLon 07 Jun 2012 at 10:51 pm

    Note, the BoW class is finally working as of svn version r8551 or OpenCV 2.4.1 .

    There are countless problems prior to this release, and I was surprised this site owner managed to get their code to even work.


  28. zernikeon 18 Jul 2012 at 2:14 pm

    Thank you for this code, it's very interessant,
    I ask you, if that I can use just main.cpp to test this code, I can extract features and create vocabulary features, training and testing, it's all in main.cpp??
    - what i need training.txt?I not need this file, what you think?
    - in your database, I not find directory of TRAIN and TEST.

    best anf thank you

  29. Amron 04 Sep 2012 at 6:45 pm

    Hello thanks a lot for the valuable help :)
    i have one question though, i am using C# to make a program like yours and to use opencv i am using the emgu wrapper to wrap opencv in c#. the problem is i cant find the bag of words functions in emgu?? what should i do?

  30. Yosafaton 28 Oct 2012 at 8:10 pm

    Thanks a lot for your help. It helps me a lot to finish my project. ^_^

  31. Giselion 09 Nov 2012 at 3:48 pm

    Hi Roy, great post! I was trying to adapt this to detect only if a logo exist on the image. In the train dir is sufficient to have 10 images and in the test dir also some 15 images? Because I can't get the SVM classifier. What is the minimum number of images it needs to work nice?

  32. Royon 12 Nov 2012 at 5:34 pm

    I believe there is no minimum number of images, rather a minimum number of extracted features.
    The code works to get a 1000-feature vocabulary (with k-means to obtain it), so obviously it needs more than 1000 extracted features.
    But you are able to tweak the number of features in the vocabulary. For a small dataset with few categories even 20 features vocabulary can be fine... you really must experiment with the size of the vocabulary to see how it effects the recognition rates.

  33. Giselion 16 Nov 2012 at 12:03 pm

    Thanks for the reply, Roy. I'm tweaking to see the best number of images. Another question... I would like to draw the ROIs of one or more detected objects in the image. In predict_common.cpp I see that you draw circles with the points, so I have a "region". But is possible to get ROIs?

  34. paulon 19 Nov 2012 at 3:44 pm

    Hi, nice tutorial!
    I have one problem, when I execute "cmake -D CMAKE_CXX_FLAGS=-fopenmp . " I get the following error:

    Could not find a configuration file for package "OpenCV" that is compatible with requested version "2.3".

    The following configuration files were considered but not accepted:

    /opencv/cmake/OpenCVConfig.cmake, version: unknown

    Do you know why it doesn't detect the version of opencv?

  35. Johnon 18 Dec 2012 at 10:39 am


    Is there a way to convert it into java android application? I want to recognize leafs instead of pizzas etc.


  36. Pavelon 11 Feb 2013 at 9:15 pm

    Hi, thanks for the great tutorial.

    I'd really appreciate if you could suggest me a way to solve this problem

    Many thanks,

  37. Rishon 07 Mar 2013 at 6:45 pm

    I am working on a low resolution images and I find it diffcult to obtain best matched features using SURF and there a alot of possibilities that Mis-match occurs. Do you think I can use technique of using high quality data set and low quality as well. Please help me out!!! Thank you

  38. Timoon 19 Apr 2013 at 4:03 pm

    Following instructions from main.cpp ends up with empty histograms. Could it be that it is necessary to add the line "detector.detect(img, keypoints)" before each "bowide.compute(img, keypoints, response_hist)" in the loop so that the corresponding keypoints of the current query image are passed to the compute function? Doing so results in a more realistic histogram representation.

  39. Bhaskaron 23 Apr 2013 at 6:22 am


    Thanks for the great tutorial. I didnt use OpenMP but I think my results (for something completely diff) are good. I knew the theory and everything but its implementation was a bit out there. Thanks again for all the help.

  40. Polkon 04 May 2013 at 10:07 pm


    It's a great works and tutorial.
    I use Xcode 4.6.6 in OS X mountain lion, OpenCV 2.4.3
    opencv works fine in xcode with compiler LLVM GCC 4.2 for OpenMP

    I have a problem that all the test images has zero maches and if the set of trainig is limit (example 10 images) all the test images goes to first image.

    I use this (line a):
    Mat training_descriptors;
    Your example use (line b):
    Mat training_descriptors(1,extractor->descriptorSize(),extractor->descriptorType());

    if I use the second (line b) the process stop and has a problem when reach the line
    Mat vocabulary = bowtrainer.cluster();
    but if I use the first (line a) it works but the results are extranges.

    Thanks for the help.

  41. [...] This is the algorithm I followed for BoW. I got a lot of help from Roy’s blog here. [...]

  42. Adam Non 20 Aug 2013 at 4:37 pm

    Hey Roy, awesome article + code! Was just wondering what size training sets you used for building the dictionary and training the SVMs? Many thanks Adam :)

  43. Rajni Kanton 23 Nov 2013 at 8:59 pm

    Is there any java implementation of this code.

  44. Kastoron 08 Jan 2014 at 3:20 pm

    Hi Roy,

    1st of all thanks alot for this great article !
    Some questions remain.

    I see you gain your positive examples by cutting the desired part out of the image,
    that means you have to use the sliding window, correct ?

    But ist that a good Idea ? I mean u need a good choice for the size of that window.
    What is your strategy on that ?


  45. Krystalon 10 Jan 2014 at 4:14 am

    Hey Roy, thanks for you tutorial and code,

    I have got the vocabulary and working on the SVM classifier now, and i m not sure about the input argv[2] postfix in the train_bovw.cpp, is it the address ofthe output file?



  46. Kastoron 19 Jan 2014 at 2:40 pm


    If u look at the SVM only sample,
    u can see that its just a mark. I think its made to distinguish
    different runs.

    Here is the setting in svm only , main

    string file_postfix = "with_colors"

  47. Amartinezon 18 Feb 2014 at 2:40 pm

    Hi guys,

    I am trying to run the code of the example and I noticed that in the repository there is not any dataset. Reading the documentation, I got the link where there is the dataset: I tried several times but the link seems to be broken. Could anyone here provide me the dataset?.

    Thanks in advance.

Trackback URI | Comments RSS

Leave a Reply