<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>More Than Technical &#187; Recommended</title>
	<atom:link href="http://www.morethantechnical.com/category/recommended/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.morethantechnical.com</link>
	<description>On software, code, the internet and more.</description>
	<lastBuildDate>Sun, 05 Feb 2012 07:04:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>A simple object classifier with Bag-of-Words using OpenCV 2.3 [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 03:34:27 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[object]]></category>
		<category><![CDATA[svm]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=917</guid>
		<description><![CDATA[ A simple object classifier with Bag-of-Words using OpenCV 2.3]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/08/20101201191626.jpg" rel="lightbox[917]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/08/20101201191626-300x178.jpg" alt="" title="20101201191626" width="300" height="178" class="alignleft size-medium wp-image-928" /></a><br />
Just wanted to share of some code I&#8217;ve been writing.<br />
So I wanted to create a food classifier, for a cool project down in the Media Lab called FoodCam. It&#8217;s basically a camera that people put free food under, and they can send an email alert to the entire building to come eat (by pushing a huge button marked &#8220;Dinner Bell&#8221;). Really a cool thing.</p>
<p>OK let&#8217;s get down to business.<br />
<span id="more-917"></span><br />
I followed a very simple technique described in <a href="http://scholar.google.com/scholar?cluster=2469382617192238945&amp;hl=en&amp;as_sdt=0,22" target="_blank">this paper</a>. I know, you say, &#8220;A Paper? Really? I&#8217;m not gonna read that technical boring stuff, give the bottom line! man.. geez.&#8221; Well, you are right, except that this paper IS the bottom line, it&#8217;s dead simple. It&#8217;s almost a tutorial. It is also referenced by the OpenCV documentation.</p>
<p>The method is simple:<br />
- Extract features of choice from training set that contains all classes.<br />
- Create a vocabulary of features by clustering the features (kNN, etc). Let&#8217;s say 1000 features long.<br />
- Train your classifiers (SVMs, Naive-Bayes, boosting, etc) on training set again (preferably a different one), this time check the features in the image for their closest clusters in the vocabulary. Create a histogram of responses for each image to words in the vocabulary, it will be a 1000-entries long vector. Create a sample-label dataset for the training.<br />
- When you get an image you havn&#8217;t seen &#8211; run the classifier and it should, god willing, give you the right class.</p>
<p>Turns out, those crafty guys in WillowGarage have done pretty much all the heavy lifting, so it&#8217;s up for us to pick the fruit of their hard work. OpenCV 2.3 comes packed with a <a href="http://opencv.itseez.com/modules/features2d/doc/object_categorization.html" target="_blank">set of classes</a>, whose names start with BOW for Bag Of Words, that help a lot with implementing this method.</p>
<p>Starting with the first step:</p>
<pre class="brush: plain; title: ; notranslate">
Mat training_descriptors(1,extractor-&gt;descriptorSize(),extractor-&gt;descriptorType());

SurfFeatureDetector detector(400);
vector keypoints;

// computing descriptors
Ptr extractor(
   new OpponentColorDescriptorExtractor(
      Ptr(new SurfDescriptorExtractor())
   )
);

while(..loop a directory? a file?..) {
   Mat img = imread(filepath);
   detector.detect(img, keypoints);
   extractor-&gt;compute(img, keypoints, descriptors);
   training_descriptors.push_back(descriptors);
}
</pre>
<p>Simple!<br />
Let&#8217;s go create a vocabulary then. Luckily, OpenCV has taken care of that, and provide a simple API:</p>
<pre class="brush: plain; title: ; notranslate">
BOWKMeansTrainer bowtrainer(1000); //num clusters
bowtrainer.add(training_descriptors);
Mat vocabulary = bowtrainer.cluster();
</pre>
<p>Boom. Vocabulary.<br />
Now, let&#8217;s train us some SVM classifiers!<br />
We&#8217;re gonna train a 2-class SVM, in a 1-vs-all kind of way. Meaning we train an SVM that can say &#8220;yes&#8221; or &#8220;no&#8221; when choosing between one class and the rest of the classes, hence 1-vs-all.<br />
But first, we need to scour the training set for our histograms (the responses to the vocabulary, remember?):</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;KeyPoint&gt; keypoints;
Mat response_hist;
Mat img;
string filepath;
map&lt;string,Mat&gt; classes_training_data;

Ptr&lt;FeatureDetector &gt; detector(new SurfFeatureDetector());
Ptr&lt;DescriptorMatcher &gt; matcher(new BruteForceMatcher&lt;L2&lt;float&gt; &gt;());
Ptr&lt;DescriptorExtractor &gt; extractor(new OpponentColorDescriptorExtractor(Ptr&lt;DescriptorExtractor&gt;(new SurfDescriptorExtractor())));
Ptr&lt;BOWImgDescriptorExtractor&gt; bowide(new BOWImgDescriptorExtractor(extractor,matcher));
bowide-&gt;setVocabulary(vocabulary);

#pragma omp parallel for schedule(dynamic,3)
for(..loop a directory?..) {
   img = imread(filepath);
   detector-&gt;detect(img,keypoints);
   bowide.compute(img, keypoints, response_hist);

   #pragma omp critical
   {
      if(classes_training_data.count(class_) == 0) { //not yet created...
         classes_training_data[class_].create(0,response_hist.cols,response_hist.type());
         classes_names.push_back(class_);
      }
      classes_training_data[class_].push_back(response_hist);
   }
   total_samples++;
}
</pre>
<p>Now, two things:<br />
First notice I&#8217;m keeping the training data for each class separately, this is because we will need this for later creating the 1-vs-all samples-labels matrices.<br />
Second, I use OpenMP multi(-threading)processing to make the calculation parallel, and hence faster, on multi-core machines (like the one I used). Time is sliced by a whole lot. OpenMP is a gem, use it more. Just a couple of #pragma directives and you&#8217;re multi-threading.</p>
<p>Alright, data gotten, let&#8217;s get training:</p>
<pre class="brush: plain; title: ; notranslate">
#pragma omp parallel for schedule(dynamic)
for (int i=0;i&lt;classes_names.size();i++) {
   string class_ = classes_names[i];
   cout &lt;&lt; omp_get_thread_num() &lt;&lt; &quot; training class: &quot; &lt;&lt; class_ &lt;&lt; &quot;..&quot; &lt;&lt; endl;

   Mat samples(0,response_cols,response_type);
   Mat labels(0,1,CV_32FC1);

   //copy class samples and label
   cout &lt;&lt; &quot;adding &quot; &lt;&lt; classes_training_data[class_].rows &lt;&lt; &quot; positive&quot; &lt;&lt; endl;
   samples.push_back(classes_training_data[class_]);
   Mat class_label = Mat::ones(classes_training_data[class_].rows, 1, CV_32FC1);
   labels.push_back(class_label);

   //copy rest samples and label
   for (map&lt;string,Mat&gt;::iterator it1 = classes_training_data.begin(); it1 != classes_training_data.end(); ++it1) {
      string not_class_ = (*it1).first;
      if(not_class_.compare(class_)==0) continue; //skip class itself
      samples.push_back(classes_training_data[not_class_]);
      class_label = Mat::zeros(classes_training_data[not_class_].rows, 1, CV_32FC1);
      labels.push_back(class_label);
   }

   cout &lt;&lt; &quot;Train..&quot; &lt;&lt; endl;
   Mat samples_32f; samples.convertTo(samples_32f, CV_32F);
   if(samples.rows == 0) continue; //phantom class?!
   CvSVM classifier;
   classifier.train(samples_32f,labels);

   //do something with the classifier, like saving it to file
}
</pre>
<p>Again, I parallelize, although the process is not too slow.<br />
Note how I build the samples and the labels, where each time I put in the positive samples and mark the labels &#8217;1&#8242;, and then I put the rest of the samples and label them &#8217;0&#8242;.</p>
<p>Moving on to &#8230;. testing the classifiers!<br />
Nothing seems to me like more fun than creating a confusion matrix! Not really, but let&#8217;s see how it&#8217;s done:</p>
<pre class="brush: plain; title: ; notranslate">
map&lt;string,map&lt;string,int&gt; &gt; confusion_matrix; // confusionMatrix[classA][classB] = number_of_times_A_voted_for_B;
map&lt;string,CvSVM&gt; classes_classifiers; //This we created earlier

vector&lt;string&gt; files; //load up with images
vector&lt;string&gt; classes; //load up with the respective classes

for(..loop over a directory?..) {
   Mat img = imread(files[i]),resposne_hist;

   vector&lt;KeyPoint&gt; keypoints;
   detector-&gt;detect(img,keypoints);
   bowide-&gt;compute(img, keypoints, response_hist);

   float minf = FLT_MAX; string minclass;
   for (map&lt;string,CvSVM&gt;::iterator it = classes_classifiers.begin(); it != classes_classifiers.end(); ++it) {
      float res = (*it).second.predict(response_hist,true);
      if (res &lt; minf) {
         minf = res;
         minclass = (*it).first;
      }
   }
   confusion_matrix[minclass][classes[i]]++;
}
</pre>
<p>When you take a look in my files, you will find a much complicated way of doing this. But this is the core idea &#8211; look in the image for the response histogram to the vocabulary of features (rather, feature-cluster-ceneters), run it by all the classifiers  and take the one with the best score. Simple.<br />
Consider making this parallel as well. No reason for it to be serial.</p>
<p>That&#8217;s about covers it.</p>
<h2>Code</h2>
<p>Lately I&#8217;m pushing stuff in Github.com using git rather than SVN on googlecode. Donno why, it&#8217;s just like that.<br />
Get the whole thing at:<br />
<code><a href="https://github.com/royshil/FoodcamClassifier" target="_blank">https://github.com/royshil/FoodcamClassifier</a></code></p>
<p>Follow the build instructions, they&#8217;re a breeze, and then follow the runnning instructions. It&#8217;s basically a series of command-line programs you run to get through each step, and in the end you have like a &#8220;predictor&#8221; service that takes an image and produces a prediction.</p>
<p>OK guys, have fun classifying stuff!<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Simple Kalman filter for tracking using OpenCV 2.2 [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 22:49:30 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[kalman]]></category>
		<category><![CDATA[tracking]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=902</guid>
		<description><![CDATA[Hi, I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman [...]]]></description>
			<content:encoded><![CDATA[<p>Hi,<br />
I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman Filter.<br />
I&#8217;m no expert on Kalman filters though, this is just a quick hack I got going as a test for a project. It worked, so I&#8217;m posting the results.<br />
<span id="more-902"></span></p>
<h2>The Filter</h2>
<p>So I wanted to do a 2D tracker that is more immune to noise. For that I set up a Kalman filter with 4 dynamic parameters and 2 measurement parameters (no control), where my measurement is: 2D location of object, and dynamic is: 2D location and 2D velocity. Pretty simple,  and it makes the transition matrix also simple.</p>
<pre class="brush: plain; title: ; notranslate">
KalmanFilter KF(4, 2, 0);
KF.transitionMatrix = *(Mat_&lt;float&gt;(4, 4) &lt;&lt; 1,0,1,0,   0,1,0,1,  0,0,1,0,  0,0,0,1);
Mat_&lt;float&gt; measurement(2,1); measurement.setTo(Scalar(0));

// init...
KF.statePre.at&lt;float&gt;(0) = mouse_info.x;
KF.statePre.at&lt;float&gt;(1) = mouse_info.y;
KF.statePre.at&lt;float&gt;(2) = 0;
KF.statePre.at&lt;float&gt;(3) = 0;
setIdentity(KF.measurementMatrix);
setIdentity(KF.processNoiseCov, Scalar::all(1e-4));
setIdentity(KF.measurementNoiseCov, Scalar::all(1e-1));
setIdentity(KF.errorCovPost, Scalar::all(.1));
</pre>
<p>Cool, moving on to the dynamic part.<br />
So I set up a mouse callback to get the mouse position every &#8220;frame&#8221; (a 100ms wait), and feed that into the filter:</p>
<pre class="brush: plain; title: ; notranslate">
// First predict, to update the internal statePre variable
Mat prediction = KF.predict();
Point predictPt(prediction.at&lt;float&gt;(0),prediction.at&lt;float&gt;(1));

// Get mouse point
measurement(0) = mouse_info.x;
measurement(1) = mouse_info.y;

Point measPt(measurement(0),measurement(1));

// The &quot;correct&quot; phase that is going to use the predicted value and our measurement
Mat estimated = KF.correct(measurement);
Point statePt(estimated.at&lt;float&gt;(0),estimated.at&lt;float&gt;(1));
</pre>
<p>All the rest is garnish (see the code)..</p>
<p>The important bit is to see that Predict() happens before Correct(). This is according to the excellent <a href="http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf">Kalman filter tutorial</a> I found. Look carefully at Figure 1-2!! It will sort you out. Also take a look at <a href="https://code.ros.org/svn/opencv/trunk/opencv/modules/video/src/kalman.cpp">OpenCV&#8217;s internal impl of Kalman</a>, see that it follows these steps closely. Especially <code> Mat&#038; KalmanFilter::predict(const Mat&#038; control)</code> and <code>Mat&#038; KalmanFilter::correct(const Mat&#038; measurement)</code>.<br />
Another good place I found that helped me formulate the parameters for the filter is <a href="http://www.marcad.com/cs584/Tracking.html">this place</a>. Again, take everything with a grain of salt, because Kalman Filters are very versatile you just need to know how to formulate them right.</p>
<h2>Result</h2>
<p>Using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" alt="" title="kalman using velocity" width="580" height="602" class="alignnone size-full wp-image-907" /></a></p>
<p>Not using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" alt="" title="kalman not using velocity" width="580" height="602" class="alignnone size-full wp-image-908" /></a></p>
<p>Some Video<br />
<iframe width="425" height="349" src="http://www.youtube.com/embed/SxtY1jQJ2fc" frameborder="0" allowfullscreen></iframe></p>
<h2>Code</h2>
<p>As usual, grab the code off the SVN:</p>
<pre class="brush: plain; title: ; notranslate">
svn co http://morethantechnical.googlecode.com/svn/trunk/mouse_kalman/main.cpp
</pre>
<p>Enjoy,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F06%2F17%2Fsimple-kalman-filter-for-tracking-using-opencv-2-2-w-code%2F&amp;title=Simple%20Kalman%20filter%20for%20tracking%20using%20OpenCV%202.2%20%5Bw%2F%20code%5D" id="wpa2a_2"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>UnderGet – Download blocked content</title>
		<link>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/</link>
		<comments>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/#comments</comments>
		<pubDate>Wed, 08 Jun 2011 14:39:33 +0000</pubDate>
		<dc:creator>Arnon</dc:creator>
				<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[blocked content]]></category>
		<category><![CDATA[corporate]]></category>
		<category><![CDATA[file extension]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[proxy]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=876</guid>
		<description><![CDATA[Ever wanted to try and download an mp3 file at your workplace, but couldn&#8217;t because corporate firewall policy was to block every url ending with the .mp3 prefix? I had. Until recently, I&#8217;d accept this as it was from above, but that was until I discovered this website called UnderGet. The trick is pretty simple. [...]]]></description>
			<content:encoded><![CDATA[<p>Ever wanted to try and download an mp3 file at your workplace, but couldn&#8217;t because corporate firewall policy was to block every url ending with the .mp3 prefix?<br />
<span id="more-876"></span></p>
<p>I had. Until recently, I&#8217;d accept this as it was from above, but that was until I discovered this website called <a href="http://www.underget.com">UnderGet</a>.</p>
<p>The trick is pretty simple. The engine behind this site works by renaming the file or encoding its content so the blocking software cannot detect it.</p>
<p>I am now able to download my favorite podcast mp3 files when I&#8217;m at my workplace</p>
<p style="text-align: center;"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/060811_1439_UnderGetDow1.png" alt="" /></p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F06%2F08%2Funderget-%25e2%2580%2593-download-blocked-content%2F&amp;title=UnderGet%20%E2%80%93%20Download%20blocked%20content" id="wpa2a_4"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Download all your Last.fm loved tracks in two simple steps</title>
		<link>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/</link>
		<comments>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/#comments</comments>
		<pubDate>Mon, 14 Mar 2011 04:27:48 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[download]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[lame]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[mp4]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=844</guid>
		<description><![CDATA[I&#8217;m a fan of Last.fm online radio, and I have a habit of marking every good song that I hear as a &#8220;loved track&#8221;. Over the years I got quite a list, and so I decided to turn it into my jogging playlist. But for that, I need all the songs downloaded to my computer [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a fan of <a href="http://www.last.fm/home">Last.fm</a> online radio, and I have a habit of marking every good song that I hear as a &#8220;loved track&#8221;. Over the years I got quite a list, and so I decided to turn it into my jogging playlist. But for that, I need all the songs downloaded to my computer so I can put them on my mobile. While Last.fm does link to Amazon for downloading all the loved songs for pay, I&#8217;m going to walk the fine moral line here and suggest how you can download every song from existing free YouTube videos.<br />
If it really bothers you, think of it as if I created a YouTube playlist and now I&#8217;m using my data plan to stream the songs off YT itself..<br />
Moral issues resolved, we can move on to the scripting.<br />
<span id="more-844"></span><br />
What you need to have:<br />
Linux-like system, <a href="http://www.mplayerhq.hu/design7/news.html">MPlayer</a>, <a href="http://lame.sourceforge.net/">Lame MP3 encoder</a>, some command-line experience or at least adventure-ness.</p>
<p>So first you&#8217;ll need to export your loved tracks from Last.fm in tab separated format &#8211; a mere button press.<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/03/Screen-shot-2011-03-14-at-12.03.26-AM.png" rel="lightbox[844]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/03/Screen-shot-2011-03-14-at-12.03.26-AM-300x111.png" alt="" title="Screen shot 2011-03-14 at 12.03.26 AM" width="300" height="111" class="aligncenter size-medium wp-image-849" /></a></p>
<p>The &#8220;tsv&#8221; (tab separated values) file has a simple format: <code>&lt;song name&gt; &lt;artist&gt; &lt;Last.fm url&gt;</code></p>
<p>And now for the script, first, the loved tracks file is tab separated, so we use AWK to get the 2 first fields which are song-name and song-artist.<br />
Then we use a neat command-line tool to download YT movies: <a href="http://rg3.github.com/youtube-dl/documentation.html">http://rg3.github.com/youtube-dl/documentation.html</a>.</p>
<pre class="brush: plain; title: ; notranslate">
mkdir mylovedtracks
cd mylovedtracks
awk -F\t '{print &quot;../youtube-dl.py -f 18 -t \&quot;ytsearch:&quot; $1 &quot; &quot; $2 &quot;\&quot;&quot;}' ../my_lovedtracks.tsv | csh
</pre>
<p>The single-liner will download all the loved tracks from the tsv file into the current directory, given that <code>youtube-dl.py &#038; my_lovedtracks.tsv</code> exist in the parent directory. <code>-f 18</code> says it will download only MP4s and <code>ytsearch</code> says it will try to search YT for the term &#8220;song-name song-artist&#8221; and download the 1st result. The <code>| csh</code> says it will send this command AWK formatted into a new shell process.</p>
<p>The saved MP4 will be named after the name of the video, with addition of the YT hash string.</p>
<p>All the mp4s have been downloaded, so let&#8217;s batch convert them to mp3s:</p>
<pre class="brush: plain; title: ; notranslate">
mkdir sound
for f in *.mp4 ; do n=`echo $f | cut -d '.' -f1`; if [ ! -e sound/$n.mp3 ]; then `mplayer $n.mp4 -vc dummy -vo null -ao pcm:file=sound/temp.wav; lame -V2 sound/temp.wav sound/$n.mp3; rm sound/temp.wav`; fi ; done
</pre>
<p>This single-liner will extract audio from the mp4 into a PCM temp.wav file using MEncoder, and then convert to VBR MP3 using Lame.<br />
You can run this command many times, as it checks if the file has not been converted yet. So you&#8217;re impatient (like me) on converting some of the MP4s before everything was downloaded &#8211; just run it, and later run it again.</p>
<p>Congrats, all your loved tracks were downloaded.</p>
<p>A few limitation to this method:<br />
* Sometimes downloaded songs are not exactly what you wanted, especially specific versions. The search is arbitrary, and can&#8217;t be controlled too much.<br />
* ID3 tags are non existent, although something can probably be done about that in the Lame encoding phase.<br />
* Very high potential for parallelization that is unexploited. Mostly in the YT download phase, where YT pushes the first ~15% of the video very fast (I saw 1200Kb/s even), and then maintains a steady d/l rate to get the video downloaded by ~1:00 minute (may be as low as 50Kb/s). Downloading many videos at once could help.<br />
* Still not a true single-liner, it is a two-step thing. But that can be done by modifying the 2nd step a bit and putting into the AWK print of the 1st step.<br />
* MP3&#8242;s volume normalization &#8211; very important! else every songs sounds different and you must do vol-up vol-down all the time&#8230;</p>
<p>Still, did a nice quick job for me&#8230;</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F03%2F14%2Fdownload-all-you-last-fm-loved-tracks-in-a-single-command%2F&amp;title=Download%20all%20your%20Last.fm%20loved%20tracks%20in%20two%20simple%20steps" id="wpa2a_6"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How to rotate a video using MEncoder and FFmpeg and live to tell the tale</title>
		<link>http://www.morethantechnical.com/2011/02/08/how-to-rotate-a-video-using-mencoder-and-ffmpeg-and-live-to-tell-the-tale/</link>
		<comments>http://www.morethantechnical.com/2011/02/08/how-to-rotate-a-video-using-mencoder-and-ffmpeg-and-live-to-tell-the-tale/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 16:43:07 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[command line]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[mencoder]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=811</guid>
		<description><![CDATA[Hi I&#8217;d like to share a quick tip on rotating video files. I&#8217;m always frustrated with taking videos with my phone. Single handedly it&#8217;s easiest to do it when the phone is upright and not in landscape mode. But the files are always saved in landscape mode, which makes them rotated when you watch. Although [...]]]></description>
			<content:encoded><![CDATA[<p>Hi</p>
<p>I&#8217;d like to share a quick tip on rotating video files.</p>
<p>I&#8217;m always frustrated with taking videos with my phone. Single handedly it&#8217;s easiest to do it when the phone is upright and not in landscape mode. But the files are always saved in landscape mode, which makes them rotated when you watch.<br />
Although there are plenty of GUI software to do it, using the command line is faster and can also be batched!</p>
<p><span id="more-811"></span></p>
<h2>Using FFmpeg</h2>
<p>This is the basic syntax<br />
<code>./ffmpeg -vf transpose=0 -i input.mp4 output.avi</code></p>
<p>Just using <code>-vf transpose=0</code> to rotate 90 deg clockwise. If you get the &#8220;Unrecognized option &#8216;vf&#8217;&#8221; error, you need to configure &#038; build ffmpeg with <code>--enable-filters</code> (or at least without<code> --disable-filters</code>), and check with <code>ffmpeg -filters</code> that you get them to show up.</p>
<p>Also, I always get very lousy video quality when using the plain vanilla settings. It turns out the problem is with the frame rate. If you leave it as-is the (default) mpeg compression makes a lot of I and P frames, and too few B frames. So set it up explicitly.</p>
<p>I ended up with<br />
<code>ffmpeg -vf transpose=0 -i input.mp4 -r 30 output.avi</code></p>
<p>Easy.</p>
<p>Update [3/1/11]: Actually just using <code>-vf transpose=0</code> will flip the video horizontally as well, which is undesirable in some cases. To counter that I use: <code>-vf "transpose=0,hflip=0"</code>, and it resolves the problem.</p>
<h2>Using MEncoder</h2>
<p>Again, pretty simple thing to do:<br />
<code>mencoder input.mp4 -nosound -o characters-resize-turn.avi -vf rotate=0 -ovc lavc -lavcopts vcodec=mpeg4 -ofps 30</code></p>
<p>Note that I kill the audio with <code>-nosound</code>, and again set the frame rate<code>-ofps 30</code> or else I get the &#8220;duplicate frames&#8221; problem.</p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F02%2F08%2Fhow-to-rotate-a-video-using-mencoder-and-ffmpeg-and-live-to-tell-the-tale%2F&amp;title=How%20to%20rotate%20a%20video%20using%20MEncoder%20and%20FFmpeg%20and%20live%20to%20tell%20the%20tale" id="wpa2a_8"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/02/08/how-to-rotate-a-video-using-mencoder-and-ffmpeg-and-live-to-tell-the-tale/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>10 lines-of-code OCR HTTP service with Python, Tesseract and Tornado</title>
		<link>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/</link>
		<comments>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 17:44:50 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[tesseract]]></category>
		<category><![CDATA[tornado]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=803</guid>
		<description><![CDATA[Hi I believe that every builder-hacker should have their own little Swiss-army-knife server that just does everything they need, but as a webservice. You can basically do anything as a service nowadays: image/audio/video manipulation, mock-cloud data storage, offload heavy computation, and so on. Tornado, the lightweight Python webserver is perfect for this, and since so [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/01/Screen-shot-2011-01-25-at-12.32.27-PM.png" rel="lightbox[803]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/01/Screen-shot-2011-01-25-at-12.32.27-PM-300x114.png" alt="" title="Screen shot 2011-01-25 at 12.32.27 PM" width="300" height="114" class="aligncenter size-medium wp-image-806" /></a>Hi</p>
<p>I believe that every builder-hacker should have their own little Swiss-army-knife server that just does everything they need, but as a webservice. You can basically do anything as a service nowadays: image/audio/video manipulation, mock-cloud data storage, offload heavy computation, and so on.<br />
<a href="http://www.tornadoweb.org/">Tornado</a>, the lightweight Python webserver is perfect for this, and since so many of the projects these days have Python binding (see <a href="https://github.com/hoffstaetter/python-tesseract">python-tesseract</a>), it should be a breeze to integrate them with minimal work.<br />
Let&#8217;s see how it&#8217;s done</p>
<p><span id="more-803"></span></p>
<h2>Putting it together</h2>
<p>I owe the simplicity of this work to the simplicity of <a href="http://www.tornadoweb.org/documentation#overview">Tornado&#8217;s API</a>. Really clean, just a couple of entry points to write code.<br />
Since this is an extremely short code, I&#8217;ll just pour it in and go over it:</p>
<pre class="brush: plain; title: ; notranslate">

import tornado.httpserver
import tornado.ioloop
import tornado.web
import pprint
import Image
from tesseract import image_to_string
import StringIO

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write('&lt;html&gt;&lt;body&gt;Send us a file!&lt;br/&gt;&lt;form enctype=&quot;multipart/form-data&quot; action=&quot;/&quot; method=&quot;post&quot;&gt;'
                   '&lt;input type=&quot;file&quot; name=&quot;the_file&quot;&gt;'
                   '&lt;input type=&quot;submit&quot; value=&quot;Submit&quot;&gt;'
                   '&lt;/form&gt;&lt;/body&gt;&lt;/html&gt;')

    def post(self):
        self.set_header(&quot;Content-Type&quot;, &quot;text/plain&quot;)
        self.write(&quot;You sent a file with name &quot; + self.request.files.items()[0][1][0]['filename'] )
	# make a &quot;memory file&quot; using StringIO, open with PIL and send to tesseract for OCR
	self.write(image_to_string(Image.open(StringIO.StringIO(self.request.files.items()[0][1][0]['body']))))

application = tornado.web.Application([
    (r&quot;/&quot;, MainHandler),
])

if __name__ == &quot;__main__&quot;:
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8888)
    tornado.ioloop.IOLoop.instance().start()
</pre>
<p>That&#8217;s it, and most of it is just garnish. The final version also contains showing the image to the screen.</p>
<p>In the main, Tornado is set up to listen to port 8888, and the application configuration tells it to answer requests on the root (&#8220;/&#8221;) with our special handler: MainHandler. Then I must define MainHandler to take care of GET and POST requests going in. All this was taken off the &#8220;Hello World&#8221; of Tornado&#8217;s API.</p>
<p>I will have the service answer to POST requests sending an image file, and route it to be processed. All attached files are on <code>self.request.files</code>, so I just pick up the first one.</p>
<p>Now <a href="http://code.google.com/p/tesseract-ocr/">Tesseract</a>, you probably already know, is an open-source OCR engine that was once built by HP and now picked up by Google. It is good as it is free, and has a set of languages already trained.<br />
But I needed a python binding to it, and did not feel like writing one of my own. So I googled and found this small humble project: <a href="https://github.com/hoffstaetter/python-tesseract">python-tesseract</a>. With a very narrow API, just a function to call tesseract that basically calls the tesseract command line. But it works like a charm.</p>
<p>So all I needed to do is take the file off the POST request, wrap a <a href="http://docs.python.org/library/stringio.html#StringIO.StringIO">StringIO</a> around it to look like a file, use <a href="http://www.pythonware.com/library/pil/handbook/image.htm#image-open-function">PIL&#8217;s Image.open</a>, and send it python-tesseract to return a string. Then I just write the string back to the HTTP response.</p>
<pre class="brush: plain; title: ; notranslate">
	self.write(image_to_string(Image.open(StringIO.StringIO(self.request.files.items()[0][1][0]['body']))))
</pre>
<p>To get it to actually run you must </p>
<ul>
<li>set up $PYTHONPATH variable to find both python-tesseract and Tornado,
<li>change $TESSDATA_PREFIX to where you put your training data for Tesserast,
<li>change the path to the <code>tesseract</code> executable in the first code line of python-tesseract&#8217;s <code>tesseract.py</code>.
</ul>
<p>Now all you need is to start your server, send image requests to it, and you&#8217;ll get back the text in the images.</p>
<h2>Code</h2>
<p>Grab it off the SVN:<br />
<code>svn checkout http://morethantechnical.googlecode.com/svn/trunk/tesserver/ tesserver</code></p>
<p>Enjoy,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F01%2F25%2F10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado%2F&amp;title=10%20lines-of-code%20OCR%20HTTP%20service%20with%20Python%2C%20Tesseract%20and%20Tornado" id="wpa2a_10"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hand gesture recognition via model fitting in energy minimization w/OpenCV</title>
		<link>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/</link>
		<comments>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 22:11:12 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[computer vision]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=762</guid>
		<description><![CDATA[Hi Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by this ancient work. The problem is setting out to find a good solution, and everything is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands-300x248.png" alt="hands with model fitted" title="hands with model fitted" width="300" height="248" class="aligncenter size-medium wp-image-796" /></a>Hi</p>
<p>Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by <a href="http://scholar.google.com/scholar?cluster=136383770354228708&#038;hl=en&#038;as_sdt=40000000">this ancient work</a>. The problem is setting out to find a good solution, and everything is very hard to understand and implement. In such cases I like to be inspired by a method, and just set out with my own implementation. This way, I understand whats going on, simplify it, and share it with you!</p>
<p>Anyway, let&#8217;s get down to business.<br />
<span id="more-762"></span></p>
<h1>A bit about energy minimization problems</h1>
<p>A dear friend revealed before me the wonders of energy minimization problems a while back, and ever since I have trying to find uses for that method. Basically, it is trying to find a global minimum for a complicated energy function (usually with many parameters), by following the function&#8217;s gradient. Such methods are often called <a href="http://en.wikipedia.org/wiki/Gradient_descent">Gradient Descent</a>, and used mostly for non-linear systems that can&#8217;t be solved easily using a least-squares variant. </p>
<p>A lot of work in computer vision was done using energy functions (I believe the most seminal was <a href="http://scholar.google.com/scholar?cluster=10809837120977085662&#038;hl=en&#038;as_sdt=40000000">Snakes</a>, over 10,000 citations), usually having two terms: Internal energy and External energy. The equilibrium between the two terms should result in a low-energy system &#8211; our optimal result. So we would like to formulate the terms in our system such that when they are 0 &#8211; they describe the system as we want it.</p>
<p>Following the works with active contours, I believe the external energy function should have to do with how the hand model fits to the hand blob, and the internal energy will have to do with how &#8220;comfortable&#8221; the hand is with this configuration.</p>
<h1>The hand model</h1>
<p>Let&#8217;s see how a 2D model of a hand might look like<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" alt="" title="Screen shot 2010-12-25 at 10.50.41 AM" width="232" height="231" class="aligncenter size-full wp-image-790" /></a><br />
Kinda looks like a rake&#8230; huh?</p>
<p>There are some parts that practically can&#8217;t change much, i.e the palm (orange), and some that might change drastically, i.e the fingers (red). Each finger has joints (blue circle), and a tip (bigger blue circle).</p>
<pre class="brush: plain; title: ; notranslate">
typedef struct finger_data {
	Point2d origin_offset;		//base or finger relative to center hand
	double a;					//angle
	vector&lt;double&gt; joints_a;	//angles of joints
	vector&lt;double&gt; joints_d;	//bone length
} FINGER_DATA;

typedef struct hand_data {
	FINGER_DATA fingers[5];		//fingers
	double a;					//angle of whole hand
	Point2d origin;				//center of palm
	Point2d origin_offset;		//offset from center for optimization
	double size;				//relative size of hand = length of a finger
} HAND_DATA;
</pre>
<p>At first I thought, since I&#8217;m only interested in the tips of the fingers, to use Inverse Kinematics to guide the tips to a certain point and let the joints find their own minimal energy position, following <a href="http://freespace.virgin.net/hugo.elias/models/m_ik2.htm">this</a> article. But I abandoned this method because of complications. </p>
<p>I also had to simplify this model, for real-time estimation and also better results. So in the end I ended up with a very rigid model, that allows only on joint per finger and no angular movement.</p>
<h1>Using tnc.c</h1>
<p>tnc.c is a &#8220;library&#8221;, essentially one c file, that implements a line search algorithm that is able to find the minimum point of a multi-variate function. I&#8217;m not certain of the algorithm details, and it&#8217;s not so important as it can be replaced with any other similar library. But, tnc.c has a great advantage &#8211; it is dead simple. One function will start the gradient decent, calling-back a function to calculate the gradients.</p>
<p>So basically I had to write just one very short function:</p>
<pre class="brush: plain; title: ; notranslate">
static int my_f(double x[], double *f, double g[], void *state) {
	DATA_FOR_TNC* d_ptr = (DATA_FOR_TNC*)state;
	DATA_FOR_TNC new_data = *d_ptr;

	mapVecToData(x,new_data.hand);

	*f = calc_Energy(new_data,*d_ptr);

	//calc gradients
	{
		double _x[SIZE_OF_HAND_DATA];

		for(int i=0;i&lt;SIZE_OF_HAND_DATA;i++) {
			memcpy(_x, x, sizeof(double)*SIZE_OF_HAND_DATA); //reset variables
			_x[i] = _x[i] + EPSILON; //change only one variable
			mapVecToData(_x, new_data.hand);
			double E_epsilon = calc_Energy(new_data,*d_ptr);
			g[i] = ((E_epsilon - *f) / EPSILON); //calc the gradient for this variable change
		}
	}

	return 0;
}
</pre>
<p>This function is called by tnc.c on every iteration of the search, the <code>double x[]</code> is the state of variables the search is now examining, <code>double* f</code> is the energy for this state, <code>double g[]</code> are the gradients (same size as x[]), and <code>voide* state</code> is a user-defined variable that can be carried along the process.</p>
<p>So what I did is simply changed the value of each parameter in turn, to test how it effects the energy in the system. I get a measure of the energy, then I subtract it from the &#8220;natural&#8221; setup (without any changes to parameters) energy measure, and I get the gradient for this parameter.</p>
<p>The energy function came out a bit different in the end:</p>
<pre class="brush: plain; title: ; notranslate">

static double calc_Energy(DATA_FOR_TNC&amp; d, DATA_FOR_TNC&amp; orig_d) {
	double _sum = 0.0;

	//external energy: how close are the joints to the hand blob? (how well do they fit to it)
	vector&lt;Point2d&gt; joints;
	Mat tips(5,1,CV_64FC2);

	for (int j=0; j&lt;5; j++) {
		joints.clear();
		FINGER_DATA f = d.hand.fingers[j];
		Point2d _newTip = newTip(f,d.hand,joints); //get joints for this finger

		for (int i=0; i&lt;tmp.size(); i++) { //for each joint find how far it is from the blob
			double ds = pointPolygonTest(d.contour, tmp[i]+getHandOrigin(d.hand), true);
			ds += 5;
			ds = 1 * ((ds &lt; 0) ? -1 : 1) * (ds*ds) ;
			_sum -= (ds &gt; 0) ? 0 : 100*ds;
		}

		tips.at&lt;Point2d&gt;(j,0) = _newTip;
	}

	//lazyness of fingers - joints should strive to be as they were in the natural pose
	vector&lt;double&gt; _angles;
//	for (int j=0; j&lt;5; j++) {
//		FINGER_DATA f = d.hand.fingers[j];
//		FINGER_DATA of = orig_d.hand.fingers[j];
////		_angles.push_back(f.a - of.a);
//		for (int i=0; i&lt;f.joints_d.size(); i++) {
////			_angles.push_back(f.joints_a[i] - of.joints_a[i]);
//			_angles.push_back(f.joints_d[i] - of.joints_d[i]);
//		}
//	}
	_angles.push_back(d.hand.a-orig_d.hand.a); //the angle of the hand should be as it was before
	_sum  += 10000*norm(Mat(_angles));

	if(_sum &lt; 0) return 0;
	return _sum;
}
</pre>
<p>You&#8217;ll notice the commented out section. The &#8220;laziness of fingers&#8221; turned out not to give good results&#8230; A different metric is needed! I have not found it yet, maybe you have a good idea?</p>
<p>Starting tnc.c is very simple: Allocating the vectors for X and gradients, initializing the model from the blob, and calling the <code>simple_tnc</code> convenience method. <code>simple_tnc</code> starts <code>tnc</code> with some default parameters that don&#8217;t affect the outcome (at least in my tries).</p>
<pre class="brush: plain; title: ; notranslate">
void estimateHand(Mat&amp; mymask) {
	double _x[SIZE_OF_HAND_DATA] = {0};
	Mat X(1,SIZE_OF_HAND_DATA,CV_64FC1,_x);
	double f;
	Mat gradients(Size(SIZE_OF_HAND_DATA,1),CV_64FC1,Scalar(0));

	namedWindow(&quot;state&quot;);

	initialize_hand_data(d, mymask);

	mapDataToVec((double*)X.data, d.hand);

	simple_tnc(SIZE_OF_HAND_DATA, (double*)X.data, &amp;f, (double*)gradients.data, my_f, (void*)&amp;d, 1, 0);

	mapVecToData((double*)X.data, d.hand);
	showstate(d,1);

	d.hand.origin = getHandOrigin(d.hand); //move to new position
}
</pre>
<h1>Results and Discussion</h1>
<p>Here are my results so far:<br />
<object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>It&#8217;s not perfect, but it&#8217;s a start. Tracking and estimating open hand is pretty good, with some orientation change as well. But when the fingers are closed&#8230; that&#8217;s where problems start. </p>
<p>Sometimes the joints &#8220;hover&#8221; over the black area to &#8220;land&#8221; in a white area so they &#8220;fit&#8221;, but they should not do that. One easy thing to do to counter this is to measure the distance of the whole bone, and not just the joint.</p>
<p>The model right now doesn&#8217;t use all the joints possible, because it is too heavy computationally. Plus the energy does not depend (or change) the angle of the fingers. So this is a very very simple model of a hand&#8230;</p>
<p>But, it is a good start! All the <a href="http://www.youtube.com/watch?v=mLT4CFLIi8A&#038;feature=related">other</a> <a href="http://www.youtube.com/watch?v=6Uw_8Y1RuQQ&#038;feature=related">stuff</a> I <a href="http://www.youtube.com/watch?v=B_UYmQJT-F0&#038;feature=related">have</a> <a href="http://www.youtube.com/watch?v=F8GVeV0dYLM&#038;feature=related">seen</a> <a href="http://www.youtube.com/watch?v=Rmh-mZFxWns&#038;feature=related">online</a> is just basic high-curvature points counting and color-based or feature-based segmentation and tracking&#8230; My model actually tries to fit an articulate and precise model of a hand to the image.</p>
<h1>How did you get such nice blobs?!</h1>
<p>You ask. They are beautiful aren&#8217;t they&#8230; nice and clean, easy for tracking and model fitting. It&#8217;s no magic though&#8230;<br />
Well, I took part of a <a href="http://depthjs.media.mit.edu/">project in the Media Lab, called DepthJS</a>, that uses the MS Kinect to control web pages. I wrote the computer-vision part. So all the <a href="https://github.com/doug/depthjs">code is there</a>, you can grab it, I just plugged it into this little project. Basing off <a href="http://openkinect.org/wiki/C%2B%2BOpenCvExample">this very simple example of using OpenCV2.X and libfreenect</a>.</p>
<p>Wow, this was a longie.. I hope you learned something and got inspired. I got to do a second overview of the project, and I&#8217;m inspired. Inspiration all around!</p>
<p>Code is obviously yours for the taking:<br />
<a href="https://github.com/royshil/OpenHPE">https://github.com/royshil/OpenHPE</a></p>
<p>Please contribute your own views, thoughts, code, rants in the comments and github page.</p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F12%2F28%2Fhand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv%2F&amp;title=Hand%20gesture%20recognition%20via%20model%20fitting%20in%20energy%20minimization%20w%2FOpenCV" id="wpa2a_12"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>An Android solution for listening to online radio while multitasking</title>
		<link>http://www.morethantechnical.com/2010/10/27/an-android-solution-for-listening-to-streamin-radio-while-multitasking-2/</link>
		<comments>http://www.morethantechnical.com/2010/10/27/an-android-solution-for-listening-to-streamin-radio-while-multitasking-2/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 10:57:11 +0000</pubDate>
		<dc:creator>Arnon</dc:creator>
				<category><![CDATA[Android]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[Stream]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=716</guid>
		<description><![CDATA[Android + Yourmuze.fm + Dolphin Browser HD + XiiaLive = WIN It&#8217;s been a while since I&#8217;ve posted anything in the blog… Sorry for that… very busy times. I had a lot of ideas of what my &#8220;comeback post&#8221; should be about, but I knew I had to share one of my relatively recent discoveries [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Android + Yourmuze.fm + Dolphin Browser HD + XiiaLive = WIN<br />
</em></strong></p>
<p><img src="http://www.morethantechnical.com/wp-content/uploads/2010/10/102710_1056_AnAndroidso5.jpg" alt="" align="left" />It&#8217;s been a while since I&#8217;ve posted anything in the blog… Sorry for that… very busy times. I had a lot of ideas of what my &#8220;comeback post&#8221; should be about, but I knew I had to share one of my relatively recent discoveries that made my smartphone online-radio listening experience a whole lot better</p>
<p>If you don&#8217;t know <a href="http://www.yourmuze.fm" target="_blank">yourmuze.fm</a>, this might be the time to get to know it. It&#8217;s a free service that has a LOT of worldwide radio stations available as an online stream for usage with most of the smartphones.</p>
<p>In order to start using it you need to register for free via your desktop computer, and add the stations you like. Later on, you can surf to the <a href="http://m.yourmuze.fm" target="_blank">mobile version</a> of the service by mobile web and listen to the stations you selected.</p>
<p>So far so good… I like it. But how about multitasking?</p>
<p><img src="http://www.morethantechnical.com/wp-content/uploads/2010/10/102710_1056_AnAndroidso3.png" alt="" /><img src="http://www.morethantechnical.com/wp-content/uploads/2010/10/102710_1056_AnAndroidso4.png" alt="" /></p>
<p><span id="more-716"></span></p>
<p>The thing is, if you are an average Android user, and try to use the service, then when you will try to hear the station, it will use the stock <strong>movie player</strong> and will not let you to do anything on your phone while listening to the music.</p>
<p>If you answer a call, or even go to the home screen – Your playback will be stopped, since it must run in the foreground</p>
<p>I have done a small research and discovered a very nice way to make yourmuze run in the background and let you do whatever you like while listening to music.</p>
<p>I use it with my GPS app, listening to online music while navigating.</p>
<p>This is relatively simple.</p>
<ol>
<li><span>Download <a href="market://search?q=pname:mobi.mgeek.TunnyBrowser">Dolphin Browser HD</a> and <a href="market://search?q=pname:com.android.DroidLiveLite">XiiaLive Lite</a> from the market (feel free to buy XiiaLive full if you like it)<br />
</span></li>
<li><span>Open up Dolphin Browser, and hit your phone&#8217;s <strong>Menu</strong> button, then select <strong>More</strong> and then <strong>Settings</strong><br />
</span></li>
<li><span>Scroll down to <strong>User Agent</strong>, and select <strong>iPhone</strong><br />
</span></li>
</ol>
<p>That&#8217;s it!</p>
<p>Go back to the browser, and surf to m.yourmuze.fm and login with your credentials. When you will try to listen to your station you will be able to choose if you want to listen with XiiaLive. I chose it to be default.</p>
<p>There you go. One the station is played you can navigate away from the app and do your multitasking</p>
<p>Hope you will find this post is helpful</p>
<p>Arnon</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F10%2F27%2Fan-android-solution-for-listening-to-streamin-radio-while-multitasking-2%2F&amp;title=An%20Android%20solution%20for%20listening%20to%20online%20radio%20while%20multitasking" id="wpa2a_14"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/10/27/an-android-solution-for-listening-to-streamin-radio-while-multitasking-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Image Recoloring using Gaussian Mixture Model and Expectation Maximization [OpenCV, w/Code]</title>
		<link>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 15:34:59 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[expectation maximization]]></category>
		<category><![CDATA[gaussian]]></category>
		<category><![CDATA[mixture model]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=673</guid>
		<description><![CDATA[Hi, I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" alt="" title="eggplant_orange" width="654" height="187" class="alignleft size-full wp-image-684" /></a>Hi,<br />
I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with OpenCV, as all the &#8220;tools&#8221; are built in.</p>
<p>I was inspired by <a href="http://www.cs.tau.ac.il/~liors/research/papers/image_appearance_exploration.pdf">Lior Shapira&#8217;s work</a> that was presented in Eurographics 09 about image appearance manipulation, and a work  about recoloring for the colorblind by <a href="http://www.sciweavers.org/files/docs/2358/icassp_cvd_poster_pdf_4a383d1fb0.pdf">Huang et al</a> presented at ICASSP 09. Both works deal with color manipulation using Gaussian Mixture Models.</p>
<p>Let&#8217;s see how it&#8217;s done!<br />
<span id="more-673"></span></p>
<h2>A little theory</h2>
<p>I won&#8217;t bore you with the math, but just to get a hang of the idea, what we would like to do is learn how the colors in the source and target images are distributed. Naturally we can assume, like many other things in nature, the colors in a picture have a <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal (Gaussian) distribution</a>, but we can go further by saying the distribution might have <strong>a few Gaussians</strong> describing it. This is called a <a href="http://en.wikipedia.org/wiki/Mixture_distribution">mixture distribution</a>, and it&#8217;s a very handy statistical tool. Mixtures can be estimated using a powerful and ubiquitous tool called <a href="http://en.wikipedia.org/wiki/Expectation_maximization">Expectation Maximization</a>, which I have previously covered. EM essentially tries to recover the mean (mu) and variance (sigma) of the Gaussians in the mixture, by iteratively checking the current hypothesis against the data until finally converging at an extremum.</p>
<h2>Learning the color model</h2>
<p>For the learning process we must set up the sample data. So we create a binary mask saying where in the image the model can find the colors to learn.<br />
&#8211;images&#8211;<br />
Then we scan the mask and for each foreground pixel we add it&#8217;s value (here, RGB, but basically can be anything) to the sample data. Finally we train the CvEM object, which contains the GMM parameters.</p>
<pre class="brush: plain; title: ; notranslate">
void TrainGMM(CvEM&amp; source_model, Mat&amp; source, Mat&amp; source_mask) {
		int src_samples_size = countNonZero(source_mask);
		Mat source_samples(src_samples_size,3,CV_32FC1);

		Mat source_32f;
		source_32f = source;

		int sample_count = 0;
		for(int y=0;y&lt;source.rows;y++) {
			Vec3f* row = source_32f.ptr&lt;Vec3f&gt;(y);  //pointer to pixel data in the row
			uchar* mask_row = source_mask.ptr&lt;uchar&gt;(y); //pointer to binary mask
			for(int x=0;x&lt;source.cols;x++) {
				if(mask_row[x] &gt; 0) {
					source_samples.at&lt;Vec3f&gt;(sample_count++,0) = row[x];
				}
			}
		}

		source_model.clear();
		CvEMParams ps(3/* = number of gaussians*/);
		source_model.train(source_samples,Mat(),ps,NULL);
}
</pre>
<p>What we have are three 3-dimensional (R,G,B) Gaussians, describing the colors in the selected area.</p>
<h2>Matching Gaussians</h2>
<p>Now we have a couple of GMMs &#8211; one for the target and one for the source. The idea is to take a pixel in the target, see how the 3 target Gaussians describe it, and shift it to use the 3 source Gaussians. This will, hopefully, cause its color to change from target to source. But we need to know which Gaussian in the target corresponds to which Gaussian in the source. I made a quick selection algorithm that greedily chooses a Gaussian for each Gaussian. I permutate the order of selection for the greedy algorithm, and pick the best permutation to get closer to the optimal selection.</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;int&gt; Recoloring::MatchGaussians(CvEM&amp; source_model, CvEM&amp; target_model) {
		int num_g = source_model.get_nclusters();
		Mat sMu(source_model.get_means());
		Mat tMu(target_model.get_means());
		const CvMat** target_covs = target_model.get_covs();
		const CvMat** source_covs = source_model.get_covs();

		double best_dist = std::numeric_limits&lt;double&gt;::max();
		vector&lt;int&gt; best_res(num_g);
		vector&lt;int&gt; prmt(num_g); 

		for(int itr = 0; itr &lt; 10; itr++) {
			for(int i=0;i&lt;num_g;i++) prmt[i] = i;	//make a permutation
			randShuffle(Mat(prmt));

			//Greedy selection
			vector&lt;int&gt; res(num_g);
			vector&lt;bool&gt; taken(num_g);
			for(int sg = 0; sg &lt; num_g; sg++) {
				double min_dist = std::numeric_limits&lt;double&gt;::max();
				int minv = -1;
				for(int tg = 0; tg &lt; num_g; tg++) {
					if(taken[tg]) continue;

					//TODO: can save on re-calculation of pairs - calculate affinity matrix ahead
					//double d = norm(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)),	tMu(Range(tg,tg+1),Range(0,3)));

					//symmetric kullback-leibler
					Mat diff = Mat(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)) - tMu(Range(tg,tg+1),Range(0,3)));
					Mat d = diff * Mat(Mat(source_covs[prmt[sg]]).inv() + Mat(target_covs[tg]).inv()) * diff.t();
					Scalar tr = trace(Mat(
						Mat(Mat(source_covs[prmt[sg]])*Mat(target_covs[tg])) +
						Mat(Mat(target_covs[tg])*Mat(source_covs[prmt[sg]]).inv()) +
						Mat(Mat::eye(3,3,CV_64FC1)*2)
						));
					double kl_dist = ((double*)d.data)[0] + tr[0];
					if(kl_dist&lt;min_dist) {
						min_dist = kl_dist;
						minv = tg;
					}
				}
				res[prmt[sg]] = minv;
				taken[minv] = true;
			}

                       //total distance for the permutation
			double dist = 0;
			for(int i=0;i&lt;num_g;i++) {
				dist += norm(sMu(Range(prmt[i],prmt[i]+1),Range(0,3)),
							tMu(Range(res[prmt[i]],res[prmt[i]]+1),Range(0,3)));
			}
			if(dist &lt; best_dist) {
				best_dist = dist;
				best_res = res;
			}
		}

		return best_res;
	}
</pre>
<p>I used Symmetric Kullback-Leibler for the distance between Gaussians, as suggested by Huang et al.</p>
<h2>Applying the color</h2>
<p>Now all we have to do is use the method Shapira suggested in his work to transform a pixel&#8217;s color, from the Gaussians describing it.<br />
I&#8217;m only putting the essence of the code, the rest is in the file.</p>
<pre class="brush: plain; title: ; notranslate">
		Mat pr; Mat samp(1,3,CV_32FC1);
		for(int y=0;y&lt;target.rows;y++) {
			Vec3f* row = target_32f.ptr&lt;Vec3f&gt;(y);
			uchar* mask_row = target_mask.ptr&lt;uchar&gt;(y);
			for(int x=0;x&lt;target.cols;x++) {
				if(mask_row[x] &gt; 0) {
                                        //take pixel data
					memcpy(samp.data,&amp;(row[x][0]),3*sizeof(float)); 

                                        //Use the GMM to predict how close this pixel is to each gaussian
					float res = target_model.predict(samp,&amp;pr);

					Mat samp_64f; samp.convertTo(samp_64f,CV_64F);

                                        //Move the pixel to the new Gaussians
					//From Shapira09: Xnew = Sum_i { pr(i) * Sigma_source_i * (Sigma_target_i)^-1 * (x - mu_target) + mu_source }
					Mat Xnew(1,3,CV_64FC1,Scalar(0));
					for(int i=0;i&lt;num_g;i++) {
						if(((float*)pr.data)[i] &lt;= 0) continue;

                                               //For each Gaussian, subtract the original mean and add the target mean,
                                               //use probabilities to get a weighted average.
						Xnew += Mat((
							//Mat(source_covs[match[i]]) *
							//Mat(target_covs[i]).inv() *
							Mat(samp_64f - tMu_64f(Range(i,i+1),Range(0,3))).t() +
							sMu_64f(Range(match[i],match[i]+1),Range(0,3)).t()
							) * (double)(((float*)pr.data)[i])).t();
					}

                                        //Put pixel back into place
					Mat _tmp; Xnew.convertTo(_tmp,CV_32F);
					memcpy(&amp;(row[x][0]),_tmp.data,sizeof(float)*3);
				}
			}
		}
</pre>
<p>You might notice I skip the part of multiplying by the covariances matrices, as Shapira did. I found it produces better results, but it&#8217;s probably caused by a bug.</p>
<h2>Results</h2>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1-300x83.png" alt="" title="recoloring_result1" width="300" height="83" class="aligncenter size-medium wp-image-675" /></a><br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result-300x79.png" alt="" title="recoloring_result" width="300" height="79" class="aligncenter size-medium wp-image-674" /></a></p>
<h2>Code and stuff</h2>
<p>Source code is available in SVN repo:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMM_Recoloring recoloring
</pre>
<p>Images from Flickr (Creative Commons):</p>
<ul>
<li>http://www.flickr.com/photos/wwworks/2956622857/sizes/s/</li>
<li>http://www.flickr.com/photos/violentz/3199292482/sizes/s/</li>
<li>http://www.flickr.com/photos/davidw/164670455/sizes/m/</li>
<li>http://www.flickr.com/photos/djania/252225693/sizes/m/</li>
</ul>
<p>Now go recolor the world!</p>
<p>Thanks for tuning in..<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F06%2F24%2Fimage-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode%2F&amp;title=Image%20Recoloring%20using%20Gaussian%20Mixture%20Model%20and%20Expectation%20Maximization%20%5BOpenCV%2C%20w%2FCode%5D" id="wpa2a_16"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Bust out your own graphcut based image segmentation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/#comments</comments>
		<pubDate>Wed, 05 May 2010 11:27:29 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[gmm]]></category>
		<category><![CDATA[graphcut]]></category>
		<category><![CDATA[segmentation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=634</guid>
		<description><![CDATA[This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment. Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation-300x125.png" alt="" title="GMM-GC-segmentation" width="300" height="125" class="alignleft size-medium wp-image-659" /></a><em>This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment.</em></p>
<p>Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now if the fish market was the image segmentation scene.</p>
<p>So I went looking for a CPP implementation of graphcut, only to find out that OpenCV already implemented it in v2.0 as part of their GrabCut impl. But I wanted to explore a bit, so I found this <a href="http://www.csd.uwo.ca/~olga/code.html">implementation by Olga Vexler</a>, which is build upon Kolmogorov&#8217;s framework for max-flow algorithms. I was also inspired by <a href="http://www.wisdom.weizmann.ac.il/~bagon/matlab.html">Shai Bagon&#8217;s usage example</a> of this implementation for Matlab.</p>
<p>Let&#8217;s jump in&#8230;<br />
<span id="more-634"></span></p>
<h2>Bit of Theory</h2>
<p>Before we move on, let&#8217;s dig in <strong>a little</strong> in the theory. We look at the picture as a set of nodes, where each pixel is node and is connected to its neighbors by edges and has a label &#8211; this can be called a <a href="http://en.wikipedia.org/wiki/Markov_random_field">Markov Random Field</a>. MRFs can be solved, i.e. give an optimal labeling for each node and thus an optimal labeling, in a number of ways, one of which being graph cuts based on <a href="http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem">maximal flow</a>. After we label the graph, we expect to get a meaningful segmentation of the image. <a href="http://www.cs.cornell.edu/~rdz/Papers/SZSVKATR-ECCV06.pdf">This paper</a>, by some of the big names in the field (Vexler, Kolmogorov, Agarwala), explains it pretty throughly. There a number of well known segmentation methods that use graph cuts, such as: Lazy Snapping [04], GrabCut [04] and more.</p>
<p>The math in the articles is, as usual, pretty horrific. I like to keep things simple, so I&#8217;ll try to explain the method of GC-segmentation in a simple way. We all remember min cut &#8211; max flow algorithms from 2nd year CS, right? well segmentation using GC is not very different. The magic happens when we weight the nodes and edges in a meaningful way, thus creating meaningful cuts. The weights are usually spit to two terms: Data term (or cost) and Smoothness term. The data term says in simple words: &#8220;How happy this pixels is with that label&#8221;, and the smoothness term pretty much says &#8220;How easy can a label expand from this pixel to that neighbor&#8221;. So when you think about it, the easiest thing would be to put as the data term the likelyhood of a pixel to belong to some label, and for the smoothness term &#8211; just use the edges in the picture!</p>
<p>So anyway, back to the code, only thing left is to create a graph, give weights, and max the flow. Here&#8217;s a bit of code:</p>
<pre class="brush: plain; title: ; notranslate">
Mat im_small = imread(&quot;a_pic.jpg&quot;);
int num_lables = 3;

// create a grid type graph
GCoptimizationGridGraph gc(im_small.cols,im_small.rows,num_lables);
</pre>
<p>This piece of code created a directed grid graph where every pixel will be a vertex, and each pixel can have one of 3 lables (3 parts in the image to segment).</p>
<h2>GMM to the rescue!</h2>
<p>Now for the weighting. One very &#8220;standard&#8221; way to give a data term to the pixels is by using <a href="http://en.wikipedia.org/wiki/Mixture_model">Gaussian-Mixture-Models</a> (GMM): A method that fits a few gaussian distributions over an unknown probability function to estimate how it looks. In the spirit of keeping things simple, I won&#8217;t go into details. I&#8217;ll just say that it&#8217;s a tool to get the probablility of a pixel to belong to a cluster of other pixels, and it has built-in implementation in OpenCV, which is reason enough for me to use it. In OpenCV GMM models are named EM, which is kind of erroneous, since EM (<a href="http://en.wikipedia.org/wiki/Mixture_model#Expectation_maximization_.28EM.29">Expectation-Maximization</a>) is one of the best methods to estimate a GMM and not a GMM itself.</p>
<p>Using EM in OpenCV is really very easy:</p>
<pre class="brush: plain; title: ; notranslate">
CvEM model;
CvEMParams ps(3);

Mat lables;
Mat samples;
im.reshape(1,im.rows*im.cols).convertTo(samples,CV_32FC1,1.0/255.0);

model.train(samples,Mat(),ps,&amp;lables);

Mat probs = model.get_probs();
</pre>
<p>Here&#8217;s how it looks when training over the whole image as input data (you can see original image, labeling, minus log probability):</p>
<div id="attachment_639" class="wp-caption alignnone" style="width: 619px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" rel="lightbox[634]"><img class="size-full wp-image-639" title="em-clustering" src="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" alt="" width="609" /></a><p class="wp-caption-text">Clustering using EM in OpenCV</p></div>
<p>But, this is not exactly what we wanted&#8230; Since we are dealing with segmentation here, we would like to segment certain area. The purpose of the GMM is to learn how that area looks, based on a small set of samples, and then predict the label for all the pixels in the image.</p>
<p>In GrabCut they pretty much create a GMM for every logical &#8220;cluster&#8221; they want to segment: Positively Background, Probably Background, Probably Foreground and Positively Foreground. This is a good idea and I will follow it, but again, I&#8217;m aiming not for Object Extraction rather for k-way segmentation. In other words I&#8217;m looking for a way to divide the image to a few areas that are significantly similar, and also not similar to the other areas.</p>
<p>To do that I will create a K-gaussians GMM for N areas (see the code, it&#8217;s a long one). I tried 2 versions, where I create a 1-gaussian GMM for each channel (RGB, etc.) of each area it&#8217;s called doEM1D(), and another one with K-gaussian and N-clusters GMM for each area. The results have varied:<br />
<div id="attachment_646" class="wp-caption alignnone" style="width: 612px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" alt="" title="Three 1-ch-1-gs GMMs" width="602" class="size-full wp-image-646" /></a><p class="wp-caption-text">Three 1 channel 1 Gaussian GMMs</p></div><br />
<div id="attachment_647" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" alt="" title="Three 3-ch-3-gs GMMs" width="600" class="size-full wp-image-647" /></a><p class="wp-caption-text">Three 3-channels 3-Gaussians per channel GMMs</p></div></p>
<p>This will provide us the data-term for our segmentation &#8211; each pixel can now say how comfortable it is with the label it got (we simply use the probability from the GMM).</p>
<h2>Play it smooth</h2>
<p>Right, moving on to the smoothness term. I mentioned before it would be easiest to just use the edges in the image. I use the Sobel filter, which gives a nice strong edge. Again we must look at each pixel&#8217;s value as the likelyhood to have an edge in it, so we should use -log to get it in nice big integers where the probability drops.</p>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" alt="" title="Edges in image" width="533" height="204" class="alignnone size-full wp-image-650" /></a></p>
<pre class="brush: plain; title: ; notranslate">
GaussianBlur(gray32f,gray32f,Size(11,11),0.75);

Sobel(gray32f,_tmp,-1,1,0,3);	//sobel for dx
_tmp1 = abs(_tmp);  //use abs value to get also the opposite direction edges
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));  //threshold the small edges...

double maxVal,minVal;
minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt1,CV_32SC1);

Sobel(gray32f,_tmp,-1,0,1,3);	//sobel for dy
_tmp1 = abs(_tmp);
res.setTo(Scalar(0));
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));

imshow(&quot;tmp1&quot;,res); waitKey();

minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt,CV_32SC1);
</pre>
<h2>Now put everything into a bowl and mix!</h2>
<p>And the last part of the process will be to put the labels probabilities per pixel and edges into the grid graph we created earlier:</p>
<pre class="brush: plain; title: ; notranslate">
GCoptimizationGridGraph gc(im.cols,im.rows,num_lables);

//Set the pixel-label probability
int N = im.cols*im.rows;
double log_1_3 = log(1.3);
for(int i=0;i&lt;N;i++) {
   double* ppt = probs.ptr&lt;double&gt;(i);
   for(int l=0;l&lt;num_lables;l++) {
      int icost = MAX(0,(int)floor(-log(ppt[l])/log2));
      gc.setDataCost(i,l,icost);
   }
}

//Set the smoothness cost
Mat Sc = 5 * (Mat::ones(num_lables,num_lables,CV_32SC1) - Mat::eye(num_lables,num_lables,CV_32SC1));
gc.setSmoothCostVH((int*)(Sc.data),(int*)dx.data,(int*)dy.data);

lables.create(N,1,CV_8UC1);

printf(&quot;\nBefore optimization energy is %d\n&quot;,gc.compute_energy());
gc.expansion(1);
printf(&quot;\nAfter optimization energy is %d\n&quot;,gc.compute_energy());

//Get the labeling back from the graph
for ( int  i = 0; i &lt; N; i++ )
   ((uchar*)(lables.data + lables.step * i))[0] = gc.whatLabel(i);
</pre>
<p>Easy. Now the labeling should give us a nice segmentation:<br />
<div id="attachment_651" class="wp-caption alignnone" style="width: 279px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" alt="" title="3class-graphcut" width="269" height="205" class="size-full wp-image-651" /></a><p class="wp-caption-text">3 x 3-ch-3-gs GMMs labeling</p></div><br />
<div id="attachment_652" class="wp-caption alignnone" style="width: 277px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" alt="" title="1class-graphcut" width="267" height="203" class="size-full wp-image-652" /></a><p class="wp-caption-text">3 x 1-ch-1-gs GMMs labeling</p></div></p>
<p>But, there&#8217;s a lot of noise in the labeling&#8230; A good heuristic to apply will be to take only the largest connected-component of each label, and also try to the the component that is closest to the original marking.<br />
<div id="attachment_654" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" alt="" title="Lables extraction " width="600" class="size-full wp-image-654" /></a><p class="wp-caption-text">Lables extraction without larget component keeping</p></div><br />
<div id="attachment_655" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" alt="" title="Labels extraction" width="600" class="size-full wp-image-655" /></a><p class="wp-caption-text">Lables extraction with largest component keeping</p></div></p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;vector&lt;Point&gt;&gt; contours;
for(int itr=0;itr&lt;2;itr++) {

Mat mask = (_tmpLabels == itr); //Get a mask of this label

contours.clear();
//find the contours in that mask
cv::findContours(mask,contours,CV_RETR_EXTERNAL,CV_CHAIN_APPROX_NONE);

//compute areas
vector&lt;double&gt; areas(contours.size());
for(unsigned int ai=0;ai&lt;contours.size();ai++) {
	Mat _pts(contours[ai]);
	Scalar mp = mean(_pts);

	areas[ai] = contourArea(Mat(contours[ai]))/* add some bias here to get components that are closer to initial marking*/;
}

//find largest connected component
double max; Point maxLoc;
minMaxLoc(Mat(areas),0,&amp;max,0,&amp;maxLoc);

//draw back on mask
_tmpLabels.setTo(Scalar(3),mask);	//all unassigned pixels will have value of 3, later we'll turn them to &quot;background&quot; pixels

mask.setTo(Scalar(0)); //clear...
drawContours(mask,contours,maxLoc.y,Scalar(255),CV_FILLED);

//now that the mask has only the wanted component...
_tmpLabels.setTo(Scalar(itr),mask);

}
</pre>
<h2>Code and salutations</h2>
<p>As usual the code is available from the blog&#8217;s SVN:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMMGraphCutSegmentation GMMGraphCutSegmentation
</pre>
<p>Hey! We&#8217;re pretty much done! Glad you (and I) made it to the end, it wasn&#8217;t easy after all&#8230; I hope you learned something about GMMs and Graph-Cuts in OpenCV and in general. </p>
<p>BTW: The pictures are from Flickr, under creative commons license.<br />
<a href="http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg" rel="lightbox[634]">http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg</a><br />
<a href="http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/">http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/</a></p>
<p>See ya!<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F05%2F05%2Fbust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code%2F&amp;title=Bust%20out%20your%20own%20graphcut%20based%20image%20segmentation%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_18"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
	</channel>
</rss>

