Bust out your own graphcut based image segmentation with OpenCV [w/ code]

This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment.

Update 10/30/2017: See a new implementation of this method using OpenCV-Python, PyMaxflow, SLIC superpixels, Delaunay and other tricks.

Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the "hottest fish in the fish-market" right now if the fish market was the image segmentation scene.

So I went looking for a CPP implementation of graphcut, only to find out that OpenCV already implemented it in v2.0 as part of their GrabCut impl. But I wanted to explore a bit, so I found this implementation by Olga Vexler, which is build upon Kolmogorov's framework for max-flow algorithms. I was also inspired by Shai Bagon's usage example of this implementation for Matlab.

Let's jump in...

Bit of Theory

Before we move on, let's dig in a little in the theory. We look at the picture as a set of nodes, where each pixel is node and is connected to its neighbors by edges and has a label - this can be called a Markov Random Field. MRFs can be solved, i.e. give an optimal labeling for each node and thus an optimal labeling, in a number of ways, one of which being graph cuts based on maximal flow. After we label the graph, we expect to get a meaningful segmentation of the image. This paper, by some of the big names in the field (Vexler, Kolmogorov, Agarwala), explains it pretty throughly. There a number of well known segmentation methods that use graph cuts, such as: Lazy Snapping [04], GrabCut [04] and more.

The math in the articles is, as usual, pretty horrific. I like to keep things simple, so I'll try to explain the method of GC-segmentation in a simple way. We all remember min cut - max flow algorithms from 2nd year CS, right? well segmentation using GC is not very different. The magic happens when we weight the nodes and edges in a meaningful way, thus creating meaningful cuts. The weights are usually spit to two terms: Data term (or cost) and Smoothness term. The data term says in simple words: "How happy this pixels is with that label", and the smoothness term pretty much says "How easy can a label expand from this pixel to that neighbor". So when you think about it, the easiest thing would be to put as the data term the likelyhood of a pixel to belong to some label, and for the smoothness term - just use the edges in the picture!

So anyway, back to the code, only thing left is to create a graph, give weights, and max the flow. Here's a bit of code:

Mat im_small = imread("a_pic.jpg");
int num_lables = 3;

// create a grid type graph
GCoptimizationGridGraph gc(im_small.cols,im_small.rows,num_lables);

This piece of code created a directed grid graph where every pixel will be a vertex, and each pixel can have one of 3 lables (3 parts in the image to segment).

GMM to the rescue!

Now for the weighting. One very "standard" way to give a data term to the pixels is by using Gaussian-Mixture-Models (GMM): A method that fits a few gaussian distributions over an unknown probability function to estimate how it looks. In the spirit of keeping things simple, I won't go into details. I'll just say that it's a tool to get the probablility of a pixel to belong to a cluster of other pixels, and it has built-in implementation in OpenCV, which is reason enough for me to use it. In OpenCV GMM models are named EM, which is kind of erroneous, since EM (Expectation-Maximization) is one of the best methods to estimate a GMM and not a GMM itself.

Using EM in OpenCV is really very easy:

CvEM model;
CvEMParams ps(3);

Mat lables;
Mat samples;
im.reshape(1,im.rows*im.cols).convertTo(samples,CV_32FC1,1.0/255.0);

model.train(samples,Mat(),ps,&lables);

Mat probs = model.get_probs();

Here's how it looks when training over the whole image as input data (you can see original image, labeling, minus log probability):

Clustering using EM in OpenCV

But, this is not exactly what we wanted... Since we are dealing with segmentation here, we would like to segment certain area. The purpose of the GMM is to learn how that area looks, based on a small set of samples, and then predict the label for all the pixels in the image.

In GrabCut they pretty much create a GMM for every logical "cluster" they want to segment: Positively Background, Probably Background, Probably Foreground and Positively Foreground. This is a good idea and I will follow it, but again, I'm aiming not for Object Extraction rather for k-way segmentation. In other words I'm looking for a way to divide the image to a few areas that are significantly similar, and also not similar to the other areas.

To do that I will create a K-gaussians GMM for N areas (see the code, it's a long one). I tried 2 versions, where I create a 1-gaussian GMM for each channel (RGB, etc.) of each area it's called doEM1D(), and another one with K-gaussian and N-clusters GMM for each area. The results have varied:

Three 1 channel 1 Gaussian GMMs

Three 3-channels 3-Gaussians per channel GMMs

This will provide us the data-term for our segmentation - each pixel can now say how comfortable it is with the label it got (we simply use the probability from the GMM).

Play it smooth

Right, moving on to the smoothness term. I mentioned before it would be easiest to just use the edges in the image. I use the Sobel filter, which gives a nice strong edge. Again we must look at each pixel's value as the likelyhood to have an edge in it, so we should use -log to get it in nice big integers where the probability drops.

GaussianBlur(gray32f,gray32f,Size(11,11),0.75);

Sobel(gray32f,_tmp,-1,1,0,3);	//sobel for dx
_tmp1 = abs(_tmp);  //use abs value to get also the opposite direction edges
_tmp1.copyTo(res,(_tmp1 > 0.2));  //threshold the small edges...

double maxVal,minVal;
minMaxLoc(_tmp,&minVal,&maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt1,CV_32SC1);

Sobel(gray32f,_tmp,-1,0,1,3);	//sobel for dy
_tmp1 = abs(_tmp);
res.setTo(Scalar(0));
_tmp1.copyTo(res,(_tmp1 > 0.2));

imshow("tmp1",res); waitKey();

minMaxLoc(_tmp,&minVal,&maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt,CV_32SC1);

Now put everything into a bowl and mix!

And the last part of the process will be to put the labels probabilities per pixel and edges into the grid graph we created earlier:

GCoptimizationGridGraph gc(im.cols,im.rows,num_lables);

//Set the pixel-label probability
int N = im.cols*im.rows;
double log_1_3 = log(1.3);
for(int i=0;i<N;i++) {
   double* ppt = probs.ptr<double>(i);
   for(int l=0;l<num_lables;l++) {
      int icost = MAX(0,(int)floor(-log(ppt[l])/log2));
      gc.setDataCost(i,l,icost);
   }
}

//Set the smoothness cost
Mat Sc = 5 * (Mat::ones(num_lables,num_lables,CV_32SC1) - Mat::eye(num_lables,num_lables,CV_32SC1));
gc.setSmoothCostVH((int*)(Sc.data),(int*)dx.data,(int*)dy.data);

lables.create(N,1,CV_8UC1);

printf("\nBefore optimization energy is %d\n",gc.compute_energy());
gc.expansion(1);
printf("\nAfter optimization energy is %d\n",gc.compute_energy());

//Get the labeling back from the graph
for ( int  i = 0; i < N; i++ )
   ((uchar*)(lables.data + lables.step * i))[0] = gc.whatLabel(i);

Easy. Now the labeling should give us a nice segmentation:

3 x 3-ch-3-gs GMMs labeling

3 x 1-ch-1-gs GMMs labeling

But, there's a lot of noise in the labeling... A good heuristic to apply will be to take only the largest connected-component of each label, and also try to the the component that is closest to the original marking.

Lables extraction without larget component keeping

Lables extraction with largest component keeping

vector<vector<Point>> contours;
for(int itr=0;itr<2;itr++) {

Mat mask = (_tmpLabels == itr); //Get a mask of this label

contours.clear();
//find the contours in that mask
cv::findContours(mask,contours,CV_RETR_EXTERNAL,CV_CHAIN_APPROX_NONE);

//compute areas
vector<double> areas(contours.size());
for(unsigned int ai=0;ai<contours.size();ai++) {
	Mat _pts(contours[ai]);
	Scalar mp = mean(_pts);

	areas[ai] = contourArea(Mat(contours[ai]))/* add some bias here to get components that are closer to initial marking*/;
}

//find largest connected component
double max; Point maxLoc;
minMaxLoc(Mat(areas),0,&max,0,&maxLoc);

//draw back on mask
_tmpLabels.setTo(Scalar(3),mask);	//all unassigned pixels will have value of 3, later we'll turn them to "background" pixels

mask.setTo(Scalar(0)); //clear...
drawContours(mask,contours,maxLoc.y,Scalar(255),CV_FILLED);

//now that the mask has only the wanted component...
_tmpLabels.setTo(Scalar(itr),mask);

}

Code and salutations

As usual the code is available from the blog's SVN:

svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMMGraphCutSegmentation GMMGraphCutSegmentation 

Hey! We're pretty much done! Glad you (and I) made it to the end, it wasn't easy after all... I hope you learned something about GMMs and Graph-Cuts in OpenCV and in general.

BTW: The pictures are from Flickr, under creative commons license.
http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg
http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/

See ya!
Roy.

Share