<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>More Than Technical &#187; Website</title>
	<atom:link href="http://www.morethantechnical.com/category/recommended/website/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.morethantechnical.com</link>
	<description>On software, code, the internet and more.</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:48:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Structure from Motion and 3D reconstruction on the easy in OpenCV 2.3+ [w/ code]</title>
		<link>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/</link>
		<comments>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 23:48:17 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[fundamental]]></category>
		<category><![CDATA[matrix]]></category>
		<category><![CDATA[motion]]></category>
		<category><![CDATA[reconstruction]]></category>
		<category><![CDATA[sfm]]></category>
		<category><![CDATA[structure]]></category>
		<category><![CDATA[triangulation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=998</guid>
		<description><![CDATA[Hello This time I&#8217;ll discuss a basic implementation of a Structure from Motion method, following the steps Hartley and Zisserman show in &#8220;The Bible&#8221; book: &#8220;Multiple View Geometry&#8221;. I will show how simply their linear method can be implemented in OpenCV. I treat this as a kind of tutorial, or a toy example, of how [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.44.42-PM.png" rel="lightbox[998]"><img class="alignleft size-medium wp-image-1064" title="SfM toy example" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.44.42-PM-300x71.png" alt="" width="300" height="71" /></a>Hello<br />
This time I&#8217;ll discuss a basic implementation of a Structure from Motion method, following the steps Hartley and Zisserman show in &#8220;The Bible&#8221; book: &#8220;Multiple View Geometry&#8221;. I will show how simply their linear method can be implemented in OpenCV.</p>
<p>I treat this as a kind of tutorial, or a toy example, of how to perform Structure from Motion in OpenCV.</p>
<p>Let&#8217;s get down to business&#8230;<br />
<span id="more-998"></span></p>
<h2>Getting a motion map</h2>
<p>The basic thing when doing reconstruction from pairs of images, is that you know the motion: How much &#8220;a pixel has moved&#8221; from one image to the other. This gives you the ability to reconstruct it&#8217;s distance from the camera(s). So our first goal is to try and understand that from a pair of two images.</p>
<p>In calibrated horizontal stereo rigs this is called <em>Disparity</em>, and it refers to the horizontal motion of a pixel. And OpenCV actually has some very good tools to recover horizontal disparity, that can be seen in this <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/stereo_match.cpp" target="_blank">sample</a>.</p>
<p>But in our case we don&#8217;t have a calibrated rig as we are doing monocular (one camera) depth reconstruction, or in other words: <em>Structure from motion</em>.</p>
<p>You can go about getting a motion map in many different ways, but two canonical ways are: optical flow and feature matching.<br />
Also, I will stick to what OpenCV has to offer, but obviously there is a whole lot of work.</p>
<div id="attachment_1051" class="wp-caption aligncenter" style="width: 562px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.31-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1051 " title="Input Images" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.31-AM.png" alt="" width="552" height="209" /></a><p class="wp-caption-text">Input pair of images, rotation and translation is unknown</p></div>
<h4>Optical Flow</h4>
<p>In optical flow you basically try to &#8220;track the pixels&#8221; from image 1 to 2, usually assuming a pixel can move only within a certain <em>window</em> in which you will search. OpenCV offers some ways to do optical flow, but I will focus on the newer and nicer one: Farenback&#8217;s method for dense optical flow.<br />
The word <em>dense</em> means we look for the motion for <em>every pixel in the image</em>. This is usually costly, but Farneback&#8217;s method is linear which is easy to solve, and they have a rocking implementation of it in OpenCV so it basically flies.<br />
Running the function on two images will provide a motion map, however my experiments show that this map is wrong in a fair bit of the times&#8230; To cope with that, I am doing an iterative operation, also leveraging the fact the this OF method can use an initial guess.<br />
An example of using Farneback method exists in the samples directory of OpenCV&#8217;s repo: <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/fback.cpp" target="_blank">here</a>.</p>
<div id="attachment_1050" class="wp-caption aligncenter" style="width: 334px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.04-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1050" title="Screen shot 2012-02-05 at 12.52.04 AM" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.04-AM.png" alt="" width="324" height="266" /></a><p class="wp-caption-text">Dense O-F using Farneback</p></div>
<h4>Feature Matching</h4>
<p>The other way of getting motion is matching features between the two images.<br />
In each image we extract salient features and invariant descriptors, and then match the two sets of features.<br />
It&#8217;s very easily done in OpenCV and widely covered by <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/matcher_simple.cpp" target="_blank">examples</a> and <a href="http://opencv.itseez.com/doc/tutorials/features2d/table_of_content_features2d/table_of_content_features2d.html" target="_blank">tutorials</a>.<br />
This method however, will not provide a dense motion map. It will provide a very sparse one at best&#8230; so that depth reconstruction will also be sparse. We may talk about how to overcome that by hacking some segmentation methods, like superpixels and graph-cuts, in a different post.</p>
<div id="attachment_1049" class="wp-caption aligncenter" style="width: 653px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.34-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1049" title="Screen shot 2012-02-05 at 12.51.34 AM" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.34-AM.png" alt="" width="643" height="264" /></a><p class="wp-caption-text">SURF features matching, with Fundamental matrix pruning via RANSAC</p></div>
<h4>A hybrid method</h4>
<p>Another way that I am working on to get motion is a hybrid between Feature Matching and Optical Flow.<br />
Basically the idea is to perform feature matching at first, and then O-F. When the motion is big, and features move quite a lot in the image, O-F sometimes fails (because pixel movement is usually confined to a search window).<br />
After we get features pairs, we can try to recover a global movement in the image. We use that movement as an initial guess for O-F.</p>
<div id="attachment_1052" class="wp-caption aligncenter" style="width: 333px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.50-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1052" title="Rigid transform flow" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.50-AM.png" alt="" width="323" height="265" /></a><p class="wp-caption-text">The rigid transform flow recovered from sparse feature matching</p></div>
<h2>Estimating Motion</h2>
<p>Once we have a motion map between the two images, it should pose no problem to recover the motion of the camera. The motion is described in the 3&#215;4 matrix P, which is combined of two elements: P = [R|t], which are the Rotational element R and Translational element t.<br />
H&amp;Z give us a bunch of ways of recovering the P matrices for both cameras in Chapter 9 of their book. The central method being &#8211; using the <a href="http://en.wikipedia.org/wiki/Fundamental_matrix_(computer_vision)" target="_blank">Fundamental Matrix</a>. This special 3&#215;3 matrix encodes the epipolar constraint between the images, to put simply: for each point x in image 1 and corresponding point x&#8217; in image 2 the following equation holds: x&#8217;Fx = 0.<br />
How does that help us? Well H&amp;Z also prove that if you have F, you can infer the two P matrices. And, if you have (sufficient) point matches between images, which we have, you can find F! Hurray!<br />
This is simply visible in the linear sense. F has 9 entries (but only 8 degrees of freedom), so if we have enough point pairs, we can solve for F in a least squares sense. But&#8230; F is better estimated in a more robust way, and OpenCV takes care of all of this for us in the function <a href="http://opencv.itseez.com/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html?highlight=findfundamentalmat#findfundamentalmat" target="_blank">findFundamentalMat</a>. There are several methods for recovering F there, linear and non-linear.<br />
However, H&amp;Z also point to a problem with using F right away &#8211; projective ambiguity. This means that the recovered camera matrices may not be the &#8220;real&#8221; ones, but instead have gone through some 3D projective transformation. To cope with this, we will use the <a href="http://en.wikipedia.org/wiki/Essential_matrix" target="_blank">Essential Matrix</a> instead, which is sort of the same thing (holds epiploar constraint over points) but for calibrated cameras. Using the Essential matrix removes the projective ambiguity and provides a Metric (or Singular) Reconstruction, which means the 3D points are true up to scaling alone, and not up to a projective transformation.</p>
<pre class="brush: plain; title: ; notranslate">
cv::FileStorage fs;
fs.open(&quot;camera_calibration.yml&quot;,cv::FileStorage::READ);
fs[&quot;camera_matrix&quot;]&gt;&gt;K;

Mat F = findFundamentalMat(imgpts1, imgpts2, FM_RANSAC, 0.1, 0.99, status);
Mat E = K.t() * F * K; //according to HZ (9.12)
</pre>
<p>Now let&#8217;s assume one camera is P = [I|0], meaning it hasn&#8217;t moved or rotated, getting the second camera matrix, P&#8217; = [R|t], is done as follows:</p>
<pre class="brush: plain; title: ; notranslate">
SVD svd(E);
Matx33d W(0,-1,0,	//HZ 9.13
	  1,0,0,
	  0,0,1);
Matx33d Winv(0,1,0,
	 -1,0,0,
	 0,0,1);
Mat_ R = svd.u * Mat(W) * svd.vt; //HZ 9.19
Mat_ t = svd.u.col(2); //u3
P1 = Matx34d(R(0,0),	R(0,1),	R(0,2),	t(0),
		 R(1,0),	R(1,1),	R(1,2),	t(1),
		 R(2,0),	R(2,1),	R(2,2), t(2));
</pre>
<p>Looks good, now let&#8217;s move on to reconstruction.</p>
<h2>Reconstruction via Triangulation</h2>
<p>Once we have two camera matrices, P and P&#8217;, we can recover the 3D structure of the scene. This can be seen simply if we think about it using ray intersection. We have two points in space of the camera centers (one in 0,0,0 and one in t), and we have the location in space of a point both on the image plane of image 1 and on the image plane of image 2. If we simply shoot a ray from from one camera center through the respective point and another ray from the other camera &#8211; the intersection of the two rays must be the real location of the object in space.<br />
In real life, none of that works. The rays usually will not intersect (so H&amp;Z refer to the mid-point algorithm, which they dismiss as a bad choice), and ray intersection in general is inferior to other triangulation methods.<br />
H&amp;Z go on to describe their &#8220;optimal&#8221; triangulation method, which optimizes the solution based on the error from reprojection of the points back to the image plane.<br />
I have implemented the linear triangulation methods they present, and wrote a post about it not long ago: <a title="http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/" href="http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/" target="_blank">Here</a>.<br />
I also added the Iterative Least Squares method that Hartley presented in his article &#8220;<a href="http://users.cecs.anu.edu.au/~hartley/Papers/triangulation/triangulation.pdf" target="_blank">Triangulation</a>&#8220;, which is said to perform very good and very fast.</p>
<div id="attachment_1056" class="wp-caption aligncenter" style="width: 333px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.22.59-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1056" title="depth map" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.22.59-AM.png" alt="" width="323" height="263" /></a><p class="wp-caption-text">&quot;Depth Map&quot;</p></div>
<div id="attachment_1057" class="wp-caption aligncenter" style="width: 576px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.23.36-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1057" title="reconstruction" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.23.36-AM.png" alt="" width="566" height="349" /></a><p class="wp-caption-text">3D reconstruction</p></div>
<p>A word of notice, many many times the reconstruction will fail because the Fundamental matrix came out wrong. The results will just look aweful, and nothing like a true reconstruction. To cope with this, you may want to insert a check that will make sure the two P matrices are not completely bogus (you could check for a reasonable rotation for example). If the P matrices, that are derived from the F matrix, are strange, then you can discard this F matrix and compute a new one.</p>
<div id="attachment_1063" class="wp-caption aligncenter" style="width: 330px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.42.02-PM.png" rel="lightbox[998]"><img class="size-full wp-image-1063" title="Bad reconstruction" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.42.02-PM.png" alt="" width="320" height="263" /></a><p class="wp-caption-text">Example of when things go bad...</p></div>
<h2>Toolbox and Framework</h2>
<p>I created a small toolbox of the various methods I spoke about in this post, and created a very simple UI. It basically allows you to load two images and then try the different methods on them and get the results.<br />
It&#8217;s using FLTK3 for the GUI, and PCL (VTK backend) for visualization of the result 3D point cloud.<br />
It also includes a few classes with a simple API that let&#8217;s you get the features matches, motion map, camera matrices from the motion, and finally the 3D point cloud.</p>
<div id="attachment_1055" class="wp-caption aligncenter" style="width: 525px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.10.47-AM.png" rel="lightbox[998]"><img class=" wp-image-1055  " title="SfM GUI" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.10.47-AM.png" alt="" width="515" height="156" /></a><p class="wp-caption-text">FLTK GUI</p></div>
<h2>Code &amp; Where to go next</h2>
<p>The code, as usual, is up for grabs at github:</p>
<pre><a title="Github repo" href="https://github.com/royshil/SfM-Toy-Library" target="_blank">https://github.com/royshil/SfM-Toy-Library</a></pre>
<p>Now, that have a firm grasp of SfM <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  you can go on to visit the following projects, which implement a much more robust solution:</p>
<p><a title="http://phototour.cs.washington.edu/bundler/" href="http://phototour.cs.washington.edu/bundler/" target="_blank">http://phototour.cs.washington.edu/bundler/</a><br />
<a title="http://code.google.com/p/libmv/" href="http://code.google.com/p/libmv/" target="_blank">http://code.google.com/p/libmv/</a><br />
<a title="http://www.cs.washington.edu/homes/ccwu/vsfm/" href="http://www.cs.washington.edu/homes/ccwu/vsfm/" target="_blank">http://www.cs.washington.edu/homes/ccwu/vsfm/</a></p>
<p>And Wikipedia points to some interesting libraries and code as well: <a href="http://en.wikipedia.org/wiki/Structure_from_motion" target="_blank">http://en.wikipedia.org/wiki/Structure_from_motion</a></p>
<p>Enjoy!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2012%2F02%2F07%2Fstructure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code%2F&amp;title=Structure%20from%20Motion%20and%203D%20reconstruction%20on%20the%20easy%20in%20OpenCV%202.3%2B%20%5Bw%2F%20code%5D" id="wpa2a_2"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple Kalman filter for tracking using OpenCV 2.2 [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 22:49:30 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[kalman]]></category>
		<category><![CDATA[tracking]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=902</guid>
		<description><![CDATA[Hi, I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman [...]]]></description>
			<content:encoded><![CDATA[<p>Hi,<br />
I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman Filter.<br />
I&#8217;m no expert on Kalman filters though, this is just a quick hack I got going as a test for a project. It worked, so I&#8217;m posting the results.<br />
<span id="more-902"></span></p>
<h2>The Filter</h2>
<p>So I wanted to do a 2D tracker that is more immune to noise. For that I set up a Kalman filter with 4 dynamic parameters and 2 measurement parameters (no control), where my measurement is: 2D location of object, and dynamic is: 2D location and 2D velocity. Pretty simple,  and it makes the transition matrix also simple.</p>
<pre class="brush: plain; title: ; notranslate">
KalmanFilter KF(4, 2, 0);
KF.transitionMatrix = *(Mat_&lt;float&gt;(4, 4) &lt;&lt; 1,0,1,0,   0,1,0,1,  0,0,1,0,  0,0,0,1);
Mat_&lt;float&gt; measurement(2,1); measurement.setTo(Scalar(0));

// init...
KF.statePre.at&lt;float&gt;(0) = mouse_info.x;
KF.statePre.at&lt;float&gt;(1) = mouse_info.y;
KF.statePre.at&lt;float&gt;(2) = 0;
KF.statePre.at&lt;float&gt;(3) = 0;
setIdentity(KF.measurementMatrix);
setIdentity(KF.processNoiseCov, Scalar::all(1e-4));
setIdentity(KF.measurementNoiseCov, Scalar::all(1e-1));
setIdentity(KF.errorCovPost, Scalar::all(.1));
</pre>
<p>Cool, moving on to the dynamic part.<br />
So I set up a mouse callback to get the mouse position every &#8220;frame&#8221; (a 100ms wait), and feed that into the filter:</p>
<pre class="brush: plain; title: ; notranslate">
// First predict, to update the internal statePre variable
Mat prediction = KF.predict();
Point predictPt(prediction.at&lt;float&gt;(0),prediction.at&lt;float&gt;(1));

// Get mouse point
measurement(0) = mouse_info.x;
measurement(1) = mouse_info.y;

Point measPt(measurement(0),measurement(1));

// The &quot;correct&quot; phase that is going to use the predicted value and our measurement
Mat estimated = KF.correct(measurement);
Point statePt(estimated.at&lt;float&gt;(0),estimated.at&lt;float&gt;(1));
</pre>
<p>All the rest is garnish (see the code)..</p>
<p>The important bit is to see that Predict() happens before Correct(). This is according to the excellent <a href="http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf">Kalman filter tutorial</a> I found. Look carefully at Figure 1-2!! It will sort you out. Also take a look at <a href="https://code.ros.org/svn/opencv/trunk/opencv/modules/video/src/kalman.cpp">OpenCV&#8217;s internal impl of Kalman</a>, see that it follows these steps closely. Especially <code> Mat&#038; KalmanFilter::predict(const Mat&#038; control)</code> and <code>Mat&#038; KalmanFilter::correct(const Mat&#038; measurement)</code>.<br />
Another good place I found that helped me formulate the parameters for the filter is <a href="http://www.marcad.com/cs584/Tracking.html">this place</a>. Again, take everything with a grain of salt, because Kalman Filters are very versatile you just need to know how to formulate them right.</p>
<h2>Result</h2>
<p>Using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" alt="" title="kalman using velocity" width="580" height="602" class="alignnone size-full wp-image-907" /></a></p>
<p>Not using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" alt="" title="kalman not using velocity" width="580" height="602" class="alignnone size-full wp-image-908" /></a></p>
<p>Some Video<br />
<iframe width="425" height="349" src="http://www.youtube.com/embed/SxtY1jQJ2fc" frameborder="0" allowfullscreen></iframe></p>
<h2>Code</h2>
<p>As usual, grab the code off the SVN:</p>
<pre class="brush: plain; title: ; notranslate">
svn co http://morethantechnical.googlecode.com/svn/trunk/mouse_kalman/main.cpp
</pre>
<p>Enjoy,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F06%2F17%2Fsimple-kalman-filter-for-tracking-using-opencv-2-2-w-code%2F&amp;title=Simple%20Kalman%20filter%20for%20tracking%20using%20OpenCV%202.2%20%5Bw%2F%20code%5D" id="wpa2a_4"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>UnderGet – Download blocked content</title>
		<link>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/</link>
		<comments>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/#comments</comments>
		<pubDate>Wed, 08 Jun 2011 14:39:33 +0000</pubDate>
		<dc:creator>Arnon</dc:creator>
				<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[blocked content]]></category>
		<category><![CDATA[corporate]]></category>
		<category><![CDATA[file extension]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[proxy]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=876</guid>
		<description><![CDATA[Ever wanted to try and download an mp3 file at your workplace, but couldn&#8217;t because corporate firewall policy was to block every url ending with the .mp3 prefix? I had. Until recently, I&#8217;d accept this as it was from above, but that was until I discovered this website called UnderGet. The trick is pretty simple. [...]]]></description>
			<content:encoded><![CDATA[<p>Ever wanted to try and download an mp3 file at your workplace, but couldn&#8217;t because corporate firewall policy was to block every url ending with the .mp3 prefix?<br />
<span id="more-876"></span></p>
<p>I had. Until recently, I&#8217;d accept this as it was from above, but that was until I discovered this website called <a href="http://www.underget.com">UnderGet</a>.</p>
<p>The trick is pretty simple. The engine behind this site works by renaming the file or encoding its content so the blocking software cannot detect it.</p>
<p>I am now able to download my favorite podcast mp3 files when I&#8217;m at my workplace</p>
<p style="text-align: center;"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/060811_1439_UnderGetDow1.png" alt="" /></p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F06%2F08%2Funderget-%25e2%2580%2593-download-blocked-content%2F&amp;title=UnderGet%20%E2%80%93%20Download%20blocked%20content" id="wpa2a_6"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/08/underget-%e2%80%93-download-blocked-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Download all your Last.fm loved tracks in two simple steps</title>
		<link>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/</link>
		<comments>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/#comments</comments>
		<pubDate>Mon, 14 Mar 2011 04:27:48 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[download]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[lame]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[mp4]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=844</guid>
		<description><![CDATA[I&#8217;m a fan of Last.fm online radio, and I have a habit of marking every good song that I hear as a &#8220;loved track&#8221;. Over the years I got quite a list, and so I decided to turn it into my jogging playlist. But for that, I need all the songs downloaded to my computer [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a fan of <a href="http://www.last.fm/home">Last.fm</a> online radio, and I have a habit of marking every good song that I hear as a &#8220;loved track&#8221;. Over the years I got quite a list, and so I decided to turn it into my jogging playlist. But for that, I need all the songs downloaded to my computer so I can put them on my mobile. While Last.fm does link to Amazon for downloading all the loved songs for pay, I&#8217;m going to walk the fine moral line here and suggest how you can download every song from existing free YouTube videos.<br />
If it really bothers you, think of it as if I created a YouTube playlist and now I&#8217;m using my data plan to stream the songs off YT itself..<br />
Moral issues resolved, we can move on to the scripting.<br />
<span id="more-844"></span><br />
What you need to have:<br />
Linux-like system, <a href="http://www.mplayerhq.hu/design7/news.html">MPlayer</a>, <a href="http://lame.sourceforge.net/">Lame MP3 encoder</a>, some command-line experience or at least adventure-ness.</p>
<p>So first you&#8217;ll need to export your loved tracks from Last.fm in tab separated format &#8211; a mere button press.<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/03/Screen-shot-2011-03-14-at-12.03.26-AM.png" rel="lightbox[844]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/03/Screen-shot-2011-03-14-at-12.03.26-AM-300x111.png" alt="" title="Screen shot 2011-03-14 at 12.03.26 AM" width="300" height="111" class="aligncenter size-medium wp-image-849" /></a></p>
<p>The &#8220;tsv&#8221; (tab separated values) file has a simple format: <code>&lt;song name&gt; &lt;artist&gt; &lt;Last.fm url&gt;</code></p>
<p>And now for the script, first, the loved tracks file is tab separated, so we use AWK to get the 2 first fields which are song-name and song-artist.<br />
Then we use a neat command-line tool to download YT movies: <a href="http://rg3.github.com/youtube-dl/documentation.html">http://rg3.github.com/youtube-dl/documentation.html</a>.</p>
<pre class="brush: plain; title: ; notranslate">
mkdir mylovedtracks
cd mylovedtracks
awk -F\t '{print &quot;../youtube-dl.py -f 18 -t \&quot;ytsearch:&quot; $1 &quot; &quot; $2 &quot;\&quot;&quot;}' ../my_lovedtracks.tsv | csh
</pre>
<p>The single-liner will download all the loved tracks from the tsv file into the current directory, given that <code>youtube-dl.py &#038; my_lovedtracks.tsv</code> exist in the parent directory. <code>-f 18</code> says it will download only MP4s and <code>ytsearch</code> says it will try to search YT for the term &#8220;song-name song-artist&#8221; and download the 1st result. The <code>| csh</code> says it will send this command AWK formatted into a new shell process.</p>
<p>The saved MP4 will be named after the name of the video, with addition of the YT hash string.</p>
<p>All the mp4s have been downloaded, so let&#8217;s batch convert them to mp3s:</p>
<pre class="brush: plain; title: ; notranslate">
mkdir sound
for f in *.mp4 ; do n=`echo $f | cut -d '.' -f1`; if [ ! -e sound/$n.mp3 ]; then `mplayer $n.mp4 -vc dummy -vo null -ao pcm:file=sound/temp.wav; lame -V2 sound/temp.wav sound/$n.mp3; rm sound/temp.wav`; fi ; done
</pre>
<p>This single-liner will extract audio from the mp4 into a PCM temp.wav file using MEncoder, and then convert to VBR MP3 using Lame.<br />
You can run this command many times, as it checks if the file has not been converted yet. So you&#8217;re impatient (like me) on converting some of the MP4s before everything was downloaded &#8211; just run it, and later run it again.</p>
<p>Congrats, all your loved tracks were downloaded.</p>
<p>A few limitation to this method:<br />
* Sometimes downloaded songs are not exactly what you wanted, especially specific versions. The search is arbitrary, and can&#8217;t be controlled too much.<br />
* ID3 tags are non existent, although something can probably be done about that in the Lame encoding phase.<br />
* Very high potential for parallelization that is unexploited. Mostly in the YT download phase, where YT pushes the first ~15% of the video very fast (I saw 1200Kb/s even), and then maintains a steady d/l rate to get the video downloaded by ~1:00 minute (may be as low as 50Kb/s). Downloading many videos at once could help.<br />
* Still not a true single-liner, it is a two-step thing. But that can be done by modifying the 2nd step a bit and putting into the AWK print of the 1st step.<br />
* MP3&#8242;s volume normalization &#8211; very important! else every songs sounds different and you must do vol-up vol-down all the time&#8230;</p>
<p>Still, did a nice quick job for me&#8230;</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F03%2F14%2Fdownload-all-you-last-fm-loved-tracks-in-a-single-command%2F&amp;title=Download%20all%20your%20Last.fm%20loved%20tracks%20in%20two%20simple%20steps" id="wpa2a_8"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/03/14/download-all-you-last-fm-loved-tracks-in-a-single-command/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>10 lines-of-code OCR HTTP service with Python, Tesseract and Tornado</title>
		<link>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/</link>
		<comments>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 17:44:50 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[tesseract]]></category>
		<category><![CDATA[tornado]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=803</guid>
		<description><![CDATA[Hi I believe that every builder-hacker should have their own little Swiss-army-knife server that just does everything they need, but as a webservice. You can basically do anything as a service nowadays: image/audio/video manipulation, mock-cloud data storage, offload heavy computation, and so on. Tornado, the lightweight Python webserver is perfect for this, and since so [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/01/Screen-shot-2011-01-25-at-12.32.27-PM.png" rel="lightbox[803]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/01/Screen-shot-2011-01-25-at-12.32.27-PM-300x114.png" alt="" title="Screen shot 2011-01-25 at 12.32.27 PM" width="300" height="114" class="aligncenter size-medium wp-image-806" /></a>Hi</p>
<p>I believe that every builder-hacker should have their own little Swiss-army-knife server that just does everything they need, but as a webservice. You can basically do anything as a service nowadays: image/audio/video manipulation, mock-cloud data storage, offload heavy computation, and so on.<br />
<a href="http://www.tornadoweb.org/">Tornado</a>, the lightweight Python webserver is perfect for this, and since so many of the projects these days have Python binding (see <a href="https://github.com/hoffstaetter/python-tesseract">python-tesseract</a>), it should be a breeze to integrate them with minimal work.<br />
Let&#8217;s see how it&#8217;s done</p>
<p><span id="more-803"></span></p>
<h2>Putting it together</h2>
<p>I owe the simplicity of this work to the simplicity of <a href="http://www.tornadoweb.org/documentation#overview">Tornado&#8217;s API</a>. Really clean, just a couple of entry points to write code.<br />
Since this is an extremely short code, I&#8217;ll just pour it in and go over it:</p>
<pre class="brush: plain; title: ; notranslate">

import tornado.httpserver
import tornado.ioloop
import tornado.web
import pprint
import Image
from tesseract import image_to_string
import StringIO

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write('&lt;html&gt;&lt;body&gt;Send us a file!&lt;br/&gt;&lt;form enctype=&quot;multipart/form-data&quot; action=&quot;/&quot; method=&quot;post&quot;&gt;'
                   '&lt;input type=&quot;file&quot; name=&quot;the_file&quot;&gt;'
                   '&lt;input type=&quot;submit&quot; value=&quot;Submit&quot;&gt;'
                   '&lt;/form&gt;&lt;/body&gt;&lt;/html&gt;')

    def post(self):
        self.set_header(&quot;Content-Type&quot;, &quot;text/plain&quot;)
        self.write(&quot;You sent a file with name &quot; + self.request.files.items()[0][1][0]['filename'] )
	# make a &quot;memory file&quot; using StringIO, open with PIL and send to tesseract for OCR
	self.write(image_to_string(Image.open(StringIO.StringIO(self.request.files.items()[0][1][0]['body']))))

application = tornado.web.Application([
    (r&quot;/&quot;, MainHandler),
])

if __name__ == &quot;__main__&quot;:
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8888)
    tornado.ioloop.IOLoop.instance().start()
</pre>
<p>That&#8217;s it, and most of it is just garnish. The final version also contains showing the image to the screen.</p>
<p>In the main, Tornado is set up to listen to port 8888, and the application configuration tells it to answer requests on the root (&#8220;/&#8221;) with our special handler: MainHandler. Then I must define MainHandler to take care of GET and POST requests going in. All this was taken off the &#8220;Hello World&#8221; of Tornado&#8217;s API.</p>
<p>I will have the service answer to POST requests sending an image file, and route it to be processed. All attached files are on <code>self.request.files</code>, so I just pick up the first one.</p>
<p>Now <a href="http://code.google.com/p/tesseract-ocr/">Tesseract</a>, you probably already know, is an open-source OCR engine that was once built by HP and now picked up by Google. It is good as it is free, and has a set of languages already trained.<br />
But I needed a python binding to it, and did not feel like writing one of my own. So I googled and found this small humble project: <a href="https://github.com/hoffstaetter/python-tesseract">python-tesseract</a>. With a very narrow API, just a function to call tesseract that basically calls the tesseract command line. But it works like a charm.</p>
<p>So all I needed to do is take the file off the POST request, wrap a <a href="http://docs.python.org/library/stringio.html#StringIO.StringIO">StringIO</a> around it to look like a file, use <a href="http://www.pythonware.com/library/pil/handbook/image.htm#image-open-function">PIL&#8217;s Image.open</a>, and send it python-tesseract to return a string. Then I just write the string back to the HTTP response.</p>
<pre class="brush: plain; title: ; notranslate">
	self.write(image_to_string(Image.open(StringIO.StringIO(self.request.files.items()[0][1][0]['body']))))
</pre>
<p>To get it to actually run you must </p>
<ul>
<li>set up $PYTHONPATH variable to find both python-tesseract and Tornado,
<li>change $TESSDATA_PREFIX to where you put your training data for Tesserast,
<li>change the path to the <code>tesseract</code> executable in the first code line of python-tesseract&#8217;s <code>tesseract.py</code>.
</ul>
<p>Now all you need is to start your server, send image requests to it, and you&#8217;ll get back the text in the images.</p>
<h2>Code</h2>
<p>Grab it off the SVN:<br />
<code>svn checkout http://morethantechnical.googlecode.com/svn/trunk/tesserver/ tesserver</code></p>
<p>Enjoy,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F01%2F25%2F10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado%2F&amp;title=10%20lines-of-code%20OCR%20HTTP%20service%20with%20Python%2C%20Tesseract%20and%20Tornado" id="wpa2a_10"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/01/25/10-lines-of-code-ocr-http-service-with-python-tesseract-and-tornado/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hand gesture recognition via model fitting in energy minimization w/OpenCV</title>
		<link>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/</link>
		<comments>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 22:11:12 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[computer vision]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=762</guid>
		<description><![CDATA[Hi Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by this ancient work. The problem is setting out to find a good solution, and everything is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands-300x248.png" alt="hands with model fitted" title="hands with model fitted" width="300" height="248" class="aligncenter size-medium wp-image-796" /></a>Hi</p>
<p>Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by <a href="http://scholar.google.com/scholar?cluster=136383770354228708&#038;hl=en&#038;as_sdt=40000000">this ancient work</a>. The problem is setting out to find a good solution, and everything is very hard to understand and implement. In such cases I like to be inspired by a method, and just set out with my own implementation. This way, I understand whats going on, simplify it, and share it with you!</p>
<p>Anyway, let&#8217;s get down to business.<br />
<span id="more-762"></span></p>
<h1>A bit about energy minimization problems</h1>
<p>A dear friend revealed before me the wonders of energy minimization problems a while back, and ever since I have trying to find uses for that method. Basically, it is trying to find a global minimum for a complicated energy function (usually with many parameters), by following the function&#8217;s gradient. Such methods are often called <a href="http://en.wikipedia.org/wiki/Gradient_descent">Gradient Descent</a>, and used mostly for non-linear systems that can&#8217;t be solved easily using a least-squares variant. </p>
<p>A lot of work in computer vision was done using energy functions (I believe the most seminal was <a href="http://scholar.google.com/scholar?cluster=10809837120977085662&#038;hl=en&#038;as_sdt=40000000">Snakes</a>, over 10,000 citations), usually having two terms: Internal energy and External energy. The equilibrium between the two terms should result in a low-energy system &#8211; our optimal result. So we would like to formulate the terms in our system such that when they are 0 &#8211; they describe the system as we want it.</p>
<p>Following the works with active contours, I believe the external energy function should have to do with how the hand model fits to the hand blob, and the internal energy will have to do with how &#8220;comfortable&#8221; the hand is with this configuration.</p>
<h1>The hand model</h1>
<p>Let&#8217;s see how a 2D model of a hand might look like<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" alt="" title="Screen shot 2010-12-25 at 10.50.41 AM" width="232" height="231" class="aligncenter size-full wp-image-790" /></a><br />
Kinda looks like a rake&#8230; huh?</p>
<p>There are some parts that practically can&#8217;t change much, i.e the palm (orange), and some that might change drastically, i.e the fingers (red). Each finger has joints (blue circle), and a tip (bigger blue circle).</p>
<pre class="brush: plain; title: ; notranslate">
typedef struct finger_data {
	Point2d origin_offset;		//base or finger relative to center hand
	double a;					//angle
	vector&lt;double&gt; joints_a;	//angles of joints
	vector&lt;double&gt; joints_d;	//bone length
} FINGER_DATA;

typedef struct hand_data {
	FINGER_DATA fingers[5];		//fingers
	double a;					//angle of whole hand
	Point2d origin;				//center of palm
	Point2d origin_offset;		//offset from center for optimization
	double size;				//relative size of hand = length of a finger
} HAND_DATA;
</pre>
<p>At first I thought, since I&#8217;m only interested in the tips of the fingers, to use Inverse Kinematics to guide the tips to a certain point and let the joints find their own minimal energy position, following <a href="http://freespace.virgin.net/hugo.elias/models/m_ik2.htm">this</a> article. But I abandoned this method because of complications. </p>
<p>I also had to simplify this model, for real-time estimation and also better results. So in the end I ended up with a very rigid model, that allows only on joint per finger and no angular movement.</p>
<h1>Using tnc.c</h1>
<p>tnc.c is a &#8220;library&#8221;, essentially one c file, that implements a line search algorithm that is able to find the minimum point of a multi-variate function. I&#8217;m not certain of the algorithm details, and it&#8217;s not so important as it can be replaced with any other similar library. But, tnc.c has a great advantage &#8211; it is dead simple. One function will start the gradient decent, calling-back a function to calculate the gradients.</p>
<p>So basically I had to write just one very short function:</p>
<pre class="brush: plain; title: ; notranslate">
static int my_f(double x[], double *f, double g[], void *state) {
	DATA_FOR_TNC* d_ptr = (DATA_FOR_TNC*)state;
	DATA_FOR_TNC new_data = *d_ptr;

	mapVecToData(x,new_data.hand);

	*f = calc_Energy(new_data,*d_ptr);

	//calc gradients
	{
		double _x[SIZE_OF_HAND_DATA];

		for(int i=0;i&lt;SIZE_OF_HAND_DATA;i++) {
			memcpy(_x, x, sizeof(double)*SIZE_OF_HAND_DATA); //reset variables
			_x[i] = _x[i] + EPSILON; //change only one variable
			mapVecToData(_x, new_data.hand);
			double E_epsilon = calc_Energy(new_data,*d_ptr);
			g[i] = ((E_epsilon - *f) / EPSILON); //calc the gradient for this variable change
		}
	}

	return 0;
}
</pre>
<p>This function is called by tnc.c on every iteration of the search, the <code>double x[]</code> is the state of variables the search is now examining, <code>double* f</code> is the energy for this state, <code>double g[]</code> are the gradients (same size as x[]), and <code>voide* state</code> is a user-defined variable that can be carried along the process.</p>
<p>So what I did is simply changed the value of each parameter in turn, to test how it effects the energy in the system. I get a measure of the energy, then I subtract it from the &#8220;natural&#8221; setup (without any changes to parameters) energy measure, and I get the gradient for this parameter.</p>
<p>The energy function came out a bit different in the end:</p>
<pre class="brush: plain; title: ; notranslate">

static double calc_Energy(DATA_FOR_TNC&amp; d, DATA_FOR_TNC&amp; orig_d) {
	double _sum = 0.0;

	//external energy: how close are the joints to the hand blob? (how well do they fit to it)
	vector&lt;Point2d&gt; joints;
	Mat tips(5,1,CV_64FC2);

	for (int j=0; j&lt;5; j++) {
		joints.clear();
		FINGER_DATA f = d.hand.fingers[j];
		Point2d _newTip = newTip(f,d.hand,joints); //get joints for this finger

		for (int i=0; i&lt;tmp.size(); i++) { //for each joint find how far it is from the blob
			double ds = pointPolygonTest(d.contour, tmp[i]+getHandOrigin(d.hand), true);
			ds += 5;
			ds = 1 * ((ds &lt; 0) ? -1 : 1) * (ds*ds) ;
			_sum -= (ds &gt; 0) ? 0 : 100*ds;
		}

		tips.at&lt;Point2d&gt;(j,0) = _newTip;
	}

	//lazyness of fingers - joints should strive to be as they were in the natural pose
	vector&lt;double&gt; _angles;
//	for (int j=0; j&lt;5; j++) {
//		FINGER_DATA f = d.hand.fingers[j];
//		FINGER_DATA of = orig_d.hand.fingers[j];
////		_angles.push_back(f.a - of.a);
//		for (int i=0; i&lt;f.joints_d.size(); i++) {
////			_angles.push_back(f.joints_a[i] - of.joints_a[i]);
//			_angles.push_back(f.joints_d[i] - of.joints_d[i]);
//		}
//	}
	_angles.push_back(d.hand.a-orig_d.hand.a); //the angle of the hand should be as it was before
	_sum  += 10000*norm(Mat(_angles));

	if(_sum &lt; 0) return 0;
	return _sum;
}
</pre>
<p>You&#8217;ll notice the commented out section. The &#8220;laziness of fingers&#8221; turned out not to give good results&#8230; A different metric is needed! I have not found it yet, maybe you have a good idea?</p>
<p>Starting tnc.c is very simple: Allocating the vectors for X and gradients, initializing the model from the blob, and calling the <code>simple_tnc</code> convenience method. <code>simple_tnc</code> starts <code>tnc</code> with some default parameters that don&#8217;t affect the outcome (at least in my tries).</p>
<pre class="brush: plain; title: ; notranslate">
void estimateHand(Mat&amp; mymask) {
	double _x[SIZE_OF_HAND_DATA] = {0};
	Mat X(1,SIZE_OF_HAND_DATA,CV_64FC1,_x);
	double f;
	Mat gradients(Size(SIZE_OF_HAND_DATA,1),CV_64FC1,Scalar(0));

	namedWindow(&quot;state&quot;);

	initialize_hand_data(d, mymask);

	mapDataToVec((double*)X.data, d.hand);

	simple_tnc(SIZE_OF_HAND_DATA, (double*)X.data, &amp;f, (double*)gradients.data, my_f, (void*)&amp;d, 1, 0);

	mapVecToData((double*)X.data, d.hand);
	showstate(d,1);

	d.hand.origin = getHandOrigin(d.hand); //move to new position
}
</pre>
<h1>Results and Discussion</h1>
<p>Here are my results so far:<br />
<object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>It&#8217;s not perfect, but it&#8217;s a start. Tracking and estimating open hand is pretty good, with some orientation change as well. But when the fingers are closed&#8230; that&#8217;s where problems start. </p>
<p>Sometimes the joints &#8220;hover&#8221; over the black area to &#8220;land&#8221; in a white area so they &#8220;fit&#8221;, but they should not do that. One easy thing to do to counter this is to measure the distance of the whole bone, and not just the joint.</p>
<p>The model right now doesn&#8217;t use all the joints possible, because it is too heavy computationally. Plus the energy does not depend (or change) the angle of the fingers. So this is a very very simple model of a hand&#8230;</p>
<p>But, it is a good start! All the <a href="http://www.youtube.com/watch?v=mLT4CFLIi8A&#038;feature=related">other</a> <a href="http://www.youtube.com/watch?v=6Uw_8Y1RuQQ&#038;feature=related">stuff</a> I <a href="http://www.youtube.com/watch?v=B_UYmQJT-F0&#038;feature=related">have</a> <a href="http://www.youtube.com/watch?v=F8GVeV0dYLM&#038;feature=related">seen</a> <a href="http://www.youtube.com/watch?v=Rmh-mZFxWns&#038;feature=related">online</a> is just basic high-curvature points counting and color-based or feature-based segmentation and tracking&#8230; My model actually tries to fit an articulate and precise model of a hand to the image.</p>
<h1>How did you get such nice blobs?!</h1>
<p>You ask. They are beautiful aren&#8217;t they&#8230; nice and clean, easy for tracking and model fitting. It&#8217;s no magic though&#8230;<br />
Well, I took part of a <a href="http://depthjs.media.mit.edu/">project in the Media Lab, called DepthJS</a>, that uses the MS Kinect to control web pages. I wrote the computer-vision part. So all the <a href="https://github.com/doug/depthjs">code is there</a>, you can grab it, I just plugged it into this little project. Basing off <a href="http://openkinect.org/wiki/C%2B%2BOpenCvExample">this very simple example of using OpenCV2.X and libfreenect</a>.</p>
<p>Wow, this was a longie.. I hope you learned something and got inspired. I got to do a second overview of the project, and I&#8217;m inspired. Inspiration all around!</p>
<p>Code is obviously yours for the taking:<br />
<a href="https://github.com/royshil/OpenHPE">https://github.com/royshil/OpenHPE</a></p>
<p>Please contribute your own views, thoughts, code, rants in the comments and github page.</p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F12%2F28%2Fhand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv%2F&amp;title=Hand%20gesture%20recognition%20via%20model%20fitting%20in%20energy%20minimization%20w%2FOpenCV" id="wpa2a_12"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Image Recoloring using Gaussian Mixture Model and Expectation Maximization [OpenCV, w/Code]</title>
		<link>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 15:34:59 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[expectation maximization]]></category>
		<category><![CDATA[gaussian]]></category>
		<category><![CDATA[mixture model]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=673</guid>
		<description><![CDATA[Hi, I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" alt="" title="eggplant_orange" width="654" height="187" class="alignleft size-full wp-image-684" /></a>Hi,<br />
I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with OpenCV, as all the &#8220;tools&#8221; are built in.</p>
<p>I was inspired by <a href="http://www.cs.tau.ac.il/~liors/research/papers/image_appearance_exploration.pdf">Lior Shapira&#8217;s work</a> that was presented in Eurographics 09 about image appearance manipulation, and a work  about recoloring for the colorblind by <a href="http://www.sciweavers.org/files/docs/2358/icassp_cvd_poster_pdf_4a383d1fb0.pdf">Huang et al</a> presented at ICASSP 09. Both works deal with color manipulation using Gaussian Mixture Models.</p>
<p>Let&#8217;s see how it&#8217;s done!<br />
<span id="more-673"></span></p>
<h2>A little theory</h2>
<p>I won&#8217;t bore you with the math, but just to get a hang of the idea, what we would like to do is learn how the colors in the source and target images are distributed. Naturally we can assume, like many other things in nature, the colors in a picture have a <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal (Gaussian) distribution</a>, but we can go further by saying the distribution might have <strong>a few Gaussians</strong> describing it. This is called a <a href="http://en.wikipedia.org/wiki/Mixture_distribution">mixture distribution</a>, and it&#8217;s a very handy statistical tool. Mixtures can be estimated using a powerful and ubiquitous tool called <a href="http://en.wikipedia.org/wiki/Expectation_maximization">Expectation Maximization</a>, which I have previously covered. EM essentially tries to recover the mean (mu) and variance (sigma) of the Gaussians in the mixture, by iteratively checking the current hypothesis against the data until finally converging at an extremum.</p>
<h2>Learning the color model</h2>
<p>For the learning process we must set up the sample data. So we create a binary mask saying where in the image the model can find the colors to learn.<br />
&#8211;images&#8211;<br />
Then we scan the mask and for each foreground pixel we add it&#8217;s value (here, RGB, but basically can be anything) to the sample data. Finally we train the CvEM object, which contains the GMM parameters.</p>
<pre class="brush: plain; title: ; notranslate">
void TrainGMM(CvEM&amp; source_model, Mat&amp; source, Mat&amp; source_mask) {
		int src_samples_size = countNonZero(source_mask);
		Mat source_samples(src_samples_size,3,CV_32FC1);

		Mat source_32f;
		source_32f = source;

		int sample_count = 0;
		for(int y=0;y&lt;source.rows;y++) {
			Vec3f* row = source_32f.ptr&lt;Vec3f&gt;(y);  //pointer to pixel data in the row
			uchar* mask_row = source_mask.ptr&lt;uchar&gt;(y); //pointer to binary mask
			for(int x=0;x&lt;source.cols;x++) {
				if(mask_row[x] &gt; 0) {
					source_samples.at&lt;Vec3f&gt;(sample_count++,0) = row[x];
				}
			}
		}

		source_model.clear();
		CvEMParams ps(3/* = number of gaussians*/);
		source_model.train(source_samples,Mat(),ps,NULL);
}
</pre>
<p>What we have are three 3-dimensional (R,G,B) Gaussians, describing the colors in the selected area.</p>
<h2>Matching Gaussians</h2>
<p>Now we have a couple of GMMs &#8211; one for the target and one for the source. The idea is to take a pixel in the target, see how the 3 target Gaussians describe it, and shift it to use the 3 source Gaussians. This will, hopefully, cause its color to change from target to source. But we need to know which Gaussian in the target corresponds to which Gaussian in the source. I made a quick selection algorithm that greedily chooses a Gaussian for each Gaussian. I permutate the order of selection for the greedy algorithm, and pick the best permutation to get closer to the optimal selection.</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;int&gt; Recoloring::MatchGaussians(CvEM&amp; source_model, CvEM&amp; target_model) {
		int num_g = source_model.get_nclusters();
		Mat sMu(source_model.get_means());
		Mat tMu(target_model.get_means());
		const CvMat** target_covs = target_model.get_covs();
		const CvMat** source_covs = source_model.get_covs();

		double best_dist = std::numeric_limits&lt;double&gt;::max();
		vector&lt;int&gt; best_res(num_g);
		vector&lt;int&gt; prmt(num_g); 

		for(int itr = 0; itr &lt; 10; itr++) {
			for(int i=0;i&lt;num_g;i++) prmt[i] = i;	//make a permutation
			randShuffle(Mat(prmt));

			//Greedy selection
			vector&lt;int&gt; res(num_g);
			vector&lt;bool&gt; taken(num_g);
			for(int sg = 0; sg &lt; num_g; sg++) {
				double min_dist = std::numeric_limits&lt;double&gt;::max();
				int minv = -1;
				for(int tg = 0; tg &lt; num_g; tg++) {
					if(taken[tg]) continue;

					//TODO: can save on re-calculation of pairs - calculate affinity matrix ahead
					//double d = norm(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)),	tMu(Range(tg,tg+1),Range(0,3)));

					//symmetric kullback-leibler
					Mat diff = Mat(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)) - tMu(Range(tg,tg+1),Range(0,3)));
					Mat d = diff * Mat(Mat(source_covs[prmt[sg]]).inv() + Mat(target_covs[tg]).inv()) * diff.t();
					Scalar tr = trace(Mat(
						Mat(Mat(source_covs[prmt[sg]])*Mat(target_covs[tg])) +
						Mat(Mat(target_covs[tg])*Mat(source_covs[prmt[sg]]).inv()) +
						Mat(Mat::eye(3,3,CV_64FC1)*2)
						));
					double kl_dist = ((double*)d.data)[0] + tr[0];
					if(kl_dist&lt;min_dist) {
						min_dist = kl_dist;
						minv = tg;
					}
				}
				res[prmt[sg]] = minv;
				taken[minv] = true;
			}

                       //total distance for the permutation
			double dist = 0;
			for(int i=0;i&lt;num_g;i++) {
				dist += norm(sMu(Range(prmt[i],prmt[i]+1),Range(0,3)),
							tMu(Range(res[prmt[i]],res[prmt[i]]+1),Range(0,3)));
			}
			if(dist &lt; best_dist) {
				best_dist = dist;
				best_res = res;
			}
		}

		return best_res;
	}
</pre>
<p>I used Symmetric Kullback-Leibler for the distance between Gaussians, as suggested by Huang et al.</p>
<h2>Applying the color</h2>
<p>Now all we have to do is use the method Shapira suggested in his work to transform a pixel&#8217;s color, from the Gaussians describing it.<br />
I&#8217;m only putting the essence of the code, the rest is in the file.</p>
<pre class="brush: plain; title: ; notranslate">
		Mat pr; Mat samp(1,3,CV_32FC1);
		for(int y=0;y&lt;target.rows;y++) {
			Vec3f* row = target_32f.ptr&lt;Vec3f&gt;(y);
			uchar* mask_row = target_mask.ptr&lt;uchar&gt;(y);
			for(int x=0;x&lt;target.cols;x++) {
				if(mask_row[x] &gt; 0) {
                                        //take pixel data
					memcpy(samp.data,&amp;(row[x][0]),3*sizeof(float)); 

                                        //Use the GMM to predict how close this pixel is to each gaussian
					float res = target_model.predict(samp,&amp;pr);

					Mat samp_64f; samp.convertTo(samp_64f,CV_64F);

                                        //Move the pixel to the new Gaussians
					//From Shapira09: Xnew = Sum_i { pr(i) * Sigma_source_i * (Sigma_target_i)^-1 * (x - mu_target) + mu_source }
					Mat Xnew(1,3,CV_64FC1,Scalar(0));
					for(int i=0;i&lt;num_g;i++) {
						if(((float*)pr.data)[i] &lt;= 0) continue;

                                               //For each Gaussian, subtract the original mean and add the target mean,
                                               //use probabilities to get a weighted average.
						Xnew += Mat((
							//Mat(source_covs[match[i]]) *
							//Mat(target_covs[i]).inv() *
							Mat(samp_64f - tMu_64f(Range(i,i+1),Range(0,3))).t() +
							sMu_64f(Range(match[i],match[i]+1),Range(0,3)).t()
							) * (double)(((float*)pr.data)[i])).t();
					}

                                        //Put pixel back into place
					Mat _tmp; Xnew.convertTo(_tmp,CV_32F);
					memcpy(&amp;(row[x][0]),_tmp.data,sizeof(float)*3);
				}
			}
		}
</pre>
<p>You might notice I skip the part of multiplying by the covariances matrices, as Shapira did. I found it produces better results, but it&#8217;s probably caused by a bug.</p>
<h2>Results</h2>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1-300x83.png" alt="" title="recoloring_result1" width="300" height="83" class="aligncenter size-medium wp-image-675" /></a><br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result-300x79.png" alt="" title="recoloring_result" width="300" height="79" class="aligncenter size-medium wp-image-674" /></a></p>
<h2>Code and stuff</h2>
<p>Source code is available in SVN repo:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMM_Recoloring recoloring
</pre>
<p>Images from Flickr (Creative Commons):</p>
<ul>
<li>http://www.flickr.com/photos/wwworks/2956622857/sizes/s/</li>
<li>http://www.flickr.com/photos/violentz/3199292482/sizes/s/</li>
<li>http://www.flickr.com/photos/davidw/164670455/sizes/m/</li>
<li>http://www.flickr.com/photos/djania/252225693/sizes/m/</li>
</ul>
<p>Now go recolor the world!</p>
<p>Thanks for tuning in..<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F06%2F24%2Fimage-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode%2F&amp;title=Image%20Recoloring%20using%20Gaussian%20Mixture%20Model%20and%20Expectation%20Maximization%20%5BOpenCV%2C%20w%2FCode%5D" id="wpa2a_14"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Bust out your own graphcut based image segmentation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/#comments</comments>
		<pubDate>Wed, 05 May 2010 11:27:29 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[gmm]]></category>
		<category><![CDATA[graphcut]]></category>
		<category><![CDATA[segmentation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=634</guid>
		<description><![CDATA[This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment. Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation-300x125.png" alt="" title="GMM-GC-segmentation" width="300" height="125" class="alignleft size-medium wp-image-659" /></a><em>This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment.</em></p>
<p>Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now if the fish market was the image segmentation scene.</p>
<p>So I went looking for a CPP implementation of graphcut, only to find out that OpenCV already implemented it in v2.0 as part of their GrabCut impl. But I wanted to explore a bit, so I found this <a href="http://www.csd.uwo.ca/~olga/code.html">implementation by Olga Vexler</a>, which is build upon Kolmogorov&#8217;s framework for max-flow algorithms. I was also inspired by <a href="http://www.wisdom.weizmann.ac.il/~bagon/matlab.html">Shai Bagon&#8217;s usage example</a> of this implementation for Matlab.</p>
<p>Let&#8217;s jump in&#8230;<br />
<span id="more-634"></span></p>
<h2>Bit of Theory</h2>
<p>Before we move on, let&#8217;s dig in <strong>a little</strong> in the theory. We look at the picture as a set of nodes, where each pixel is node and is connected to its neighbors by edges and has a label &#8211; this can be called a <a href="http://en.wikipedia.org/wiki/Markov_random_field">Markov Random Field</a>. MRFs can be solved, i.e. give an optimal labeling for each node and thus an optimal labeling, in a number of ways, one of which being graph cuts based on <a href="http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem">maximal flow</a>. After we label the graph, we expect to get a meaningful segmentation of the image. <a href="http://www.cs.cornell.edu/~rdz/Papers/SZSVKATR-ECCV06.pdf">This paper</a>, by some of the big names in the field (Vexler, Kolmogorov, Agarwala), explains it pretty throughly. There a number of well known segmentation methods that use graph cuts, such as: Lazy Snapping [04], GrabCut [04] and more.</p>
<p>The math in the articles is, as usual, pretty horrific. I like to keep things simple, so I&#8217;ll try to explain the method of GC-segmentation in a simple way. We all remember min cut &#8211; max flow algorithms from 2nd year CS, right? well segmentation using GC is not very different. The magic happens when we weight the nodes and edges in a meaningful way, thus creating meaningful cuts. The weights are usually spit to two terms: Data term (or cost) and Smoothness term. The data term says in simple words: &#8220;How happy this pixels is with that label&#8221;, and the smoothness term pretty much says &#8220;How easy can a label expand from this pixel to that neighbor&#8221;. So when you think about it, the easiest thing would be to put as the data term the likelyhood of a pixel to belong to some label, and for the smoothness term &#8211; just use the edges in the picture!</p>
<p>So anyway, back to the code, only thing left is to create a graph, give weights, and max the flow. Here&#8217;s a bit of code:</p>
<pre class="brush: plain; title: ; notranslate">
Mat im_small = imread(&quot;a_pic.jpg&quot;);
int num_lables = 3;

// create a grid type graph
GCoptimizationGridGraph gc(im_small.cols,im_small.rows,num_lables);
</pre>
<p>This piece of code created a directed grid graph where every pixel will be a vertex, and each pixel can have one of 3 lables (3 parts in the image to segment).</p>
<h2>GMM to the rescue!</h2>
<p>Now for the weighting. One very &#8220;standard&#8221; way to give a data term to the pixels is by using <a href="http://en.wikipedia.org/wiki/Mixture_model">Gaussian-Mixture-Models</a> (GMM): A method that fits a few gaussian distributions over an unknown probability function to estimate how it looks. In the spirit of keeping things simple, I won&#8217;t go into details. I&#8217;ll just say that it&#8217;s a tool to get the probablility of a pixel to belong to a cluster of other pixels, and it has built-in implementation in OpenCV, which is reason enough for me to use it. In OpenCV GMM models are named EM, which is kind of erroneous, since EM (<a href="http://en.wikipedia.org/wiki/Mixture_model#Expectation_maximization_.28EM.29">Expectation-Maximization</a>) is one of the best methods to estimate a GMM and not a GMM itself.</p>
<p>Using EM in OpenCV is really very easy:</p>
<pre class="brush: plain; title: ; notranslate">
CvEM model;
CvEMParams ps(3);

Mat lables;
Mat samples;
im.reshape(1,im.rows*im.cols).convertTo(samples,CV_32FC1,1.0/255.0);

model.train(samples,Mat(),ps,&amp;lables);

Mat probs = model.get_probs();
</pre>
<p>Here&#8217;s how it looks when training over the whole image as input data (you can see original image, labeling, minus log probability):</p>
<div id="attachment_639" class="wp-caption alignnone" style="width: 619px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" rel="lightbox[634]"><img class="size-full wp-image-639" title="em-clustering" src="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" alt="" width="609" /></a><p class="wp-caption-text">Clustering using EM in OpenCV</p></div>
<p>But, this is not exactly what we wanted&#8230; Since we are dealing with segmentation here, we would like to segment certain area. The purpose of the GMM is to learn how that area looks, based on a small set of samples, and then predict the label for all the pixels in the image.</p>
<p>In GrabCut they pretty much create a GMM for every logical &#8220;cluster&#8221; they want to segment: Positively Background, Probably Background, Probably Foreground and Positively Foreground. This is a good idea and I will follow it, but again, I&#8217;m aiming not for Object Extraction rather for k-way segmentation. In other words I&#8217;m looking for a way to divide the image to a few areas that are significantly similar, and also not similar to the other areas.</p>
<p>To do that I will create a K-gaussians GMM for N areas (see the code, it&#8217;s a long one). I tried 2 versions, where I create a 1-gaussian GMM for each channel (RGB, etc.) of each area it&#8217;s called doEM1D(), and another one with K-gaussian and N-clusters GMM for each area. The results have varied:<br />
<div id="attachment_646" class="wp-caption alignnone" style="width: 612px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" alt="" title="Three 1-ch-1-gs GMMs" width="602" class="size-full wp-image-646" /></a><p class="wp-caption-text">Three 1 channel 1 Gaussian GMMs</p></div><br />
<div id="attachment_647" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" alt="" title="Three 3-ch-3-gs GMMs" width="600" class="size-full wp-image-647" /></a><p class="wp-caption-text">Three 3-channels 3-Gaussians per channel GMMs</p></div></p>
<p>This will provide us the data-term for our segmentation &#8211; each pixel can now say how comfortable it is with the label it got (we simply use the probability from the GMM).</p>
<h2>Play it smooth</h2>
<p>Right, moving on to the smoothness term. I mentioned before it would be easiest to just use the edges in the image. I use the Sobel filter, which gives a nice strong edge. Again we must look at each pixel&#8217;s value as the likelyhood to have an edge in it, so we should use -log to get it in nice big integers where the probability drops.</p>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" alt="" title="Edges in image" width="533" height="204" class="alignnone size-full wp-image-650" /></a></p>
<pre class="brush: plain; title: ; notranslate">
GaussianBlur(gray32f,gray32f,Size(11,11),0.75);

Sobel(gray32f,_tmp,-1,1,0,3);	//sobel for dx
_tmp1 = abs(_tmp);  //use abs value to get also the opposite direction edges
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));  //threshold the small edges...

double maxVal,minVal;
minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt1,CV_32SC1);

Sobel(gray32f,_tmp,-1,0,1,3);	//sobel for dy
_tmp1 = abs(_tmp);
res.setTo(Scalar(0));
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));

imshow(&quot;tmp1&quot;,res); waitKey();

minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt,CV_32SC1);
</pre>
<h2>Now put everything into a bowl and mix!</h2>
<p>And the last part of the process will be to put the labels probabilities per pixel and edges into the grid graph we created earlier:</p>
<pre class="brush: plain; title: ; notranslate">
GCoptimizationGridGraph gc(im.cols,im.rows,num_lables);

//Set the pixel-label probability
int N = im.cols*im.rows;
double log_1_3 = log(1.3);
for(int i=0;i&lt;N;i++) {
   double* ppt = probs.ptr&lt;double&gt;(i);
   for(int l=0;l&lt;num_lables;l++) {
      int icost = MAX(0,(int)floor(-log(ppt[l])/log2));
      gc.setDataCost(i,l,icost);
   }
}

//Set the smoothness cost
Mat Sc = 5 * (Mat::ones(num_lables,num_lables,CV_32SC1) - Mat::eye(num_lables,num_lables,CV_32SC1));
gc.setSmoothCostVH((int*)(Sc.data),(int*)dx.data,(int*)dy.data);

lables.create(N,1,CV_8UC1);

printf(&quot;\nBefore optimization energy is %d\n&quot;,gc.compute_energy());
gc.expansion(1);
printf(&quot;\nAfter optimization energy is %d\n&quot;,gc.compute_energy());

//Get the labeling back from the graph
for ( int  i = 0; i &lt; N; i++ )
   ((uchar*)(lables.data + lables.step * i))[0] = gc.whatLabel(i);
</pre>
<p>Easy. Now the labeling should give us a nice segmentation:<br />
<div id="attachment_651" class="wp-caption alignnone" style="width: 279px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" alt="" title="3class-graphcut" width="269" height="205" class="size-full wp-image-651" /></a><p class="wp-caption-text">3 x 3-ch-3-gs GMMs labeling</p></div><br />
<div id="attachment_652" class="wp-caption alignnone" style="width: 277px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" alt="" title="1class-graphcut" width="267" height="203" class="size-full wp-image-652" /></a><p class="wp-caption-text">3 x 1-ch-1-gs GMMs labeling</p></div></p>
<p>But, there&#8217;s a lot of noise in the labeling&#8230; A good heuristic to apply will be to take only the largest connected-component of each label, and also try to the the component that is closest to the original marking.<br />
<div id="attachment_654" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" alt="" title="Lables extraction " width="600" class="size-full wp-image-654" /></a><p class="wp-caption-text">Lables extraction without larget component keeping</p></div><br />
<div id="attachment_655" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" alt="" title="Labels extraction" width="600" class="size-full wp-image-655" /></a><p class="wp-caption-text">Lables extraction with largest component keeping</p></div></p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;vector&lt;Point&gt;&gt; contours;
for(int itr=0;itr&lt;2;itr++) {

Mat mask = (_tmpLabels == itr); //Get a mask of this label

contours.clear();
//find the contours in that mask
cv::findContours(mask,contours,CV_RETR_EXTERNAL,CV_CHAIN_APPROX_NONE);

//compute areas
vector&lt;double&gt; areas(contours.size());
for(unsigned int ai=0;ai&lt;contours.size();ai++) {
	Mat _pts(contours[ai]);
	Scalar mp = mean(_pts);

	areas[ai] = contourArea(Mat(contours[ai]))/* add some bias here to get components that are closer to initial marking*/;
}

//find largest connected component
double max; Point maxLoc;
minMaxLoc(Mat(areas),0,&amp;max,0,&amp;maxLoc);

//draw back on mask
_tmpLabels.setTo(Scalar(3),mask);	//all unassigned pixels will have value of 3, later we'll turn them to &quot;background&quot; pixels

mask.setTo(Scalar(0)); //clear...
drawContours(mask,contours,maxLoc.y,Scalar(255),CV_FILLED);

//now that the mask has only the wanted component...
_tmpLabels.setTo(Scalar(itr),mask);

}
</pre>
<h2>Code and salutations</h2>
<p>As usual the code is available from the blog&#8217;s SVN:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMMGraphCutSegmentation GMMGraphCutSegmentation
</pre>
<p>Hey! We&#8217;re pretty much done! Glad you (and I) made it to the end, it wasn&#8217;t easy after all&#8230; I hope you learned something about GMMs and Graph-Cuts in OpenCV and in general. </p>
<p>BTW: The pictures are from Flickr, under creative commons license.<br />
<a href="http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg" rel="lightbox[634]">http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg</a><br />
<a href="http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/">http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/</a></p>
<p>See ya!<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F05%2F05%2Fbust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code%2F&amp;title=Bust%20out%20your%20own%20graphcut%20based%20image%20segmentation%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_16"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>Quick and Easy Head Pose Estimation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 16:38:49 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[computer vision]]></category>
		<category><![CDATA[head pose]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=623</guid>
		<description><![CDATA[Hi Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods. I implemented a very quick &#38; dirty solution based [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" rel="lightbox[623]"><img class="alignleft size-full wp-image-624" style="border: 1px solid black; margin-right: 5px;" title="j5" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" alt="" width="350" height="175" /></a>Hi</p>
<p>Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent <a href="http://people.ict.usc.edu/~gratch/CSCI534/Head%20Pose%20estimation.pdf" target="_blank">overview of almost all known methods</a>.</p>
<p>I implemented a very quick &amp; dirty solution based on OpenCV&#8217;s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method &#8211; solvePnP &#8211; that does this, given some calibration parameters that I completely disregarded.</p>
<p>Here&#8217;s how it&#8217;s done</p>
<p><span id="more-623"></span></p>
<h2>Intro</h2>
<p>I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It&#8217;s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.</p>
<p>So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used <a href="http://meshlab.sourceforge.net/" target="_blank">MeshLab </a>to mark some points on the model:</p>
<ol>
<li>Left ear</li>
<li>Right ear</li>
<li>Left eye</li>
<li>Right eye</li>
<li>Nose tip</li>
<li>Left mouth corner</li>
<li>Right mouth corner</li>
</ol>
<p>Then I headed to <a href="http://vis-www.cs.umass.edu/lfw/" target="_blank">LFW database</a> to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina&#8217;s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.</p>
<h2>Time to Code</h2>
<p>First I initialize the 3D points vector, and a dummy camera matrix:</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;Point3f &gt; modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear

op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //&quot;calibration matrix&quot;: center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);
</pre>
<p>Even though the &#8220;calibration&#8221; parameters are totally bogus they work pretty good.</p>
<p>Now, we&#8217;re all ready to start estimating some poses. So let&#8217;s use solvePnP:</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;Point2f &gt; imagePoints;

//read 2D points from file...
FILE* f;
fopen_s(&amp;f,&quot;points.txt&quot;,&quot;r&quot;);
for(int i=0;i&lt;7;i++) {
     int x,y;
     fscanf_s(f,&quot;%d&quot;,&amp;x); fscanf_s(f,&quot;%d&quot;,&amp;y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);&lt;/td&gt;

//make a Mat of the vector&lt;&gt;
Mat ip(imagePoints);

//display points on image
Mat img = imread(&quot;image.png&quot;);
for(unsigned int i=0;i&lt;imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);

//&quot;distortion coefficients&quot;... hah!
double _dc[] = {0,0,0,0};

//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);

//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr&lt;double&gt;();
printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
</pre>
<p>Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:</p>
<pre class="brush: plain; title: ; notranslate">
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();

//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera

//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

glPopMatrix();
glPopMatrix();
//----------End axes --------------
</pre>
<p>That wasn&#8217;t too hard, huh? Awesome.</p>
<h2>So&#8230;. Results</h2>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<h2>Code</h2>
<p>You can grab the code from the SVN repo:</p>
<pre class="brush: plain; title: ; notranslate">

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose
</pre>
<p>Enjoy!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F19%2Fquick-and-easy-head-pose-estimation-with-opencv-w-code%2F&amp;title=Quick%20and%20Easy%20Head%20Pose%20Estimation%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_18"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Implementing PTAM: stereo, tracking and pose estimation for AR with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 16:53:11 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[3d]]></category>
		<category><![CDATA[augmented reality]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=606</guid>
		<description><![CDATA[Hi Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [PDF]. This is a work by George Klein [homepage] and David Murray from Oxford university, presented in ISMAR 2007. When I first saw it on [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" rel="lightbox[606]"><img class="alignleft size-full wp-image-617" title="ptam" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" alt="" width="350" height="286" /></a>Hi</p>
<p>Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [<a href="http://www.robots.ox.ac.uk/~gk/publications/KleinMurray2007ISMAR.pdf" target="_blank">PDF</a>]. This is a work by George Klein [<a href="http://www.robots.ox.ac.uk/~gk/" target="_blank">homepage</a>] and David Murray from Oxford university, presented in ISMAR 2007.</p>
<p>When I first saw it on youtube [<a href="http://www.youtube.com/watch?v=pBI5HwitBX4" target="_blank">link</a>] I immediately saw the immense potential &#8211; mobile markerless augmented reality. I thought I should get to know this work a bit more closely, so I chose to implement it as a part of advanced computer vision course, given by Dr. Lior Wolf [<a href="http://www.cs.tau.ac.il/~wolf/" target="_blank">link</a>] at TAU.</p>
<p>The work is very extensive, and clearly is a result of deep research in the field, so I set to achieve a few selected features: Stereo initialization, Tracking, and small map upkeeping. I chose not to implement relocalization and full map handling.</p>
<p>This post is kind of a tutorial for 3D reconstruction with OpenCV 2.0. I will show practical use of the functions in cvtriangulation.cpp, which are not documented and in fact incomplete. Furthermore I&#8217;ll show how to easily combine OpenCV and OpenGL for 3D augmentations, a thing which is only briefly described in the docs or online.</p>
<p>Here are the step I took and things I learned in the process of implementing the work.</p>
<p>Update: A nice patch by yazor fixes the video mismatching &#8211; thanks! and also a nice application by Zentium called &#8220;iKat&#8221; is doing some kick-ass <a href="http://gizmodo.com/5489946/ikat-augmented-reality-app-works-without-real+world-prompt">mobile markerless augmented reality</a>.<br />
<span id="more-606"></span></p>
<h2>Preparations&#8230;</h2>
<p>Before going straight to coding, I had to prepare a few things.</p>
<ul>
<li>A working compilation of OpenCV &#8211; not trivial with the new version 2.0.</li>
<li>A calibrated camera.</li>
<li>Test data</li>
</ul>
<p>Compiling OpenCV 2.0 proved to be a bit tricky. Even though the sourceforge project offers binary release for Win32, I compiled the whole thing from source. It turned out the binary release doesn&#8217;t contain .lib files, and anyway has compatibility issues between MS VS 2005 and 2008 &#8211; something about the embedded manifest [<a href="http://www.google.com/search?q=opencv+2.0+VS+2008+manifest+erro" target="_blank">google</a>]. I downloaded the freshest source from SVN, and compiled it, but it didn&#8217;t solve the debug-release problem, so I was left with using the release dlls even for debug evironment.</p>
<p>Initially I thought I&#8217;ll try an uncalibrated camera approach, but soon abandoned it. I had to calibrate my cameras, which I did  very easily using OpenCV&#8217;s &#8220;calibration.cpp&#8221;, which strangely is <strong>not built</strong> when building all examples &#8211; it has to be built manually. But everything went smoothly, and I soon got a calibration matrix (focal length, center of projection) and radial distortion coefficients.</p>
<h3>Getting Test Data</h3>
<p><object style="width: 480px; height: 295px;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="295" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" /><param name="align" value="left" /><embed style="width: 480px; height: 295px;" type="application/x-shockwave-flash" width="480" height="295" src="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" align="left"></embed></object>For the test data I wanted to get a few views of a planar scene, where the first two views are separated only by a translation of ~5cm, as K&amp;M do in the PTAM article. This known translation is helpful when trying to triangulate the initial features in the scene. When you have prior knowledge of where the cameras are, you can simply intersect the epipolar lines between the two views and recover the 3D position of the points &#8211; up to a scale. Keep in mind you must also have feature correspondence: a point on image A must be correlated to a point in image B.</p>
<p>To achieve this I set up a small program that uses Optical Flow to track some 2D features in the scene, and grab a few screens + feature vectors. See &#8216;capture_data.cpp&#8217;.</p>
<h2>Stereo Initialization</h2>
<p>Now that I have 2 views with feature correspodence:</p>
<p><a rel="lightbox" href="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png"><img class="alignnone size-full wp-image-607" title="frames_correl" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png" alt="" width="634" height="259" /></a></p>
<p>I would like to triangulate the features. This is possible, as I discussed earlier, since I know the rotation (none), translation (5cm on -x axis) and camera calibration parameters (focal length, center of projection).</p>
<h3>Triangulation</h3>
<p>For triangulation, OpenCV has only recently added a couple of functions that implement triangulation [<a href="http://n2.nabble.com/An-implementation-of-the-Optimal-Triangulation-Method-td2295331.html" target="_blank">link</a>] as shown by Hartly &amp; Zisserman [<a href="http://users.cecs.anu.edu.au/~hartley/Papers/CVPR99-tutorial/tut_4up.pdf" target="_blank">PDF</a>, page 12]. However, these functions are not formally documented, and in fact they are missing some important parts. This is how I used cvTriangulation(), which is the key function:</p>
<pre class="brush: plain; title: ; notranslate">
//this function will initialize the 3D features from two views
void stereoInit() {

//first load camera intrinsic parameters
FileStorage fs(&quot;cam_work.out&quot;,CV_STORAGE_READ);
FileNode fn = fs[&quot;camera_matrix&quot;];
camera_matrix = Mat((CvMat*)fn.readObj(),true);

fn = fs[&quot;distortion_coefficients&quot;];
distortion_coefficients = Mat((CvMat*)fn.readObj(),true);

//vector&lt;Point2d&gt; points[2]; //these Point2d vectors hold the 2D features, double precision, from the 2 views

//get copy of points
_points[0] = points[0];
_points[1] = points[1];
Mat pts1M(_points[0]), pts2M(_points[1]); //very easy in OpenCV 2.0 to convert vector&lt;&gt; to Mat.

//Undistort points
Mat tmp,tmpOut;
pts1M.convertTo(tmp,CV_32FC2);  //undistort takes only floats not doubles, so convert to Point2f
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts1M,CV_64FC2);  //go back to double precision

pts2M.convertTo(tmp,CV_32FC2);
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts2M,CV_64FC2);

vector&lt;uchar&gt; tri_status; //this will hold the status for each point, a good point will have 1, bad - 0

//now triangulate
triangulate(_points[0],_points[1],tri_status);

}

void triangulate(vector&lt;Point2d&gt;&amp; points1, vector&lt;Point2d&gt;&amp; points2, vector&lt;uchar&gt;&amp; status) {

	//Convert points to 1-channel, 2-rows, double precision - This is important - see the code
...

	Mat ___tmp(2,pts1Mt.cols,CV_64FC1,__d);
...
	Mat ___tmp1(2,pts2Mt.cols,CV_64FC1,__d1);
...

	CvMat __points1 = ___tmp, __points2 = ___tmp1;

	//projection matrices
	double P1d[12] = {	-1,0,0,0,
						0,1,0,0,
						0,0,1,0 };	//Identity, but looking into -z axis
	Mat P1m(3,4,CV_64FC1,P1d);
	CvMat* P1 = &amp;(CvMat)P1m;
	double P2d[12] = {	-1,0,0,-5,
						0,1,0,0,
						0,0,1,0 };  //Identity rotation, 5cm -x translation, looking into -z axis
	Mat P2m(3,4,CV_64FC1,P2d);
	CvMat* P2 = &amp;(CvMat)P2m;

	float _d[1000] = {0.0f};
	Mat outTM(4,points1.size(),CV_32FC1,_d);
	CvMat* out = &amp;(CvMat)outTM;

//using cvTriangulate with the created structures
	cvTriangulatePoints(P1,P2,&amp;__points1,&amp;__points2,out);

//we should check the triangulation result by reprojecting 3D-&gt;2D and checking distance
	vector&lt;Point2d&gt; projPoints[2] = {points1,points2};

	double point2D_dat[3] = {0};
	double point3D_dat[4] = {0};
	Mat twoD(3,1,CV_64FC1,point2D_dat);
	Mat threeD(4,1,CV_64FC1,point3D_dat);

	Mat P[2] = {Mat(P1),Mat(P2)};

	int oc = out-&gt;cols, oc2 = out-&gt;cols*2, oc3 = out-&gt;cols*3;

	status = vector&lt;uchar&gt;(oc);

	//scan all points, reproject 3D-&gt;2D, and keep only good ones
	for(int i=0;i&lt;oc;i++) {
		double W = out-&gt;data.fl[i+oc3];
        point3D_dat[0] = out-&gt;data.fl[i] / W;
        point3D_dat[1] = out-&gt;data.fl[i+oc] / W;
        point3D_dat[2] = out-&gt;data.fl[i+oc2] / W;
        point3D_dat[3] = 1;

        bool push = true;
        /* !!! Project this point for each camera */
        for( int currCamera = 0; currCamera &lt; 2; currCamera++ )
        {
            //reproject! using the P matrix of the current camera
			twoD = P[currCamera] * threeD;

            float x,y;
            float xr,yr,wr;
 	x = (float)projPoints[currCamera][i].x;
	y = (float)projPoints[currCamera][i].y;

            wr = (float)point2D_dat[2];
            xr = (float)(point2D_dat[0]/wr);
            yr = (float)(point2D_dat[1]/wr);

            float deltaX,deltaY;
            deltaX = (float)fabs(x-xr);
            deltaY = (float)fabs(y-yr);

			//printf(&quot;error from cam %d (%.2f,%.2f): %.6f %.6f\n&quot;,currCamera,x,y,deltaX,deltaY);

			if(deltaX &gt; 0.01 || deltaY &gt; 0.01) {
				push = false;
			}
        }
		if(push) {
			// A good 3D reconstructed point, add to known world points

			double s = 7;
			Point3d p3d(point3D_dat[0]/s,point3D_dat[1]/s,point3D_dat[2]/s);
			//printf(&quot;%.3f %.3f %.3f\n&quot;,p3d.x,p3d.y,p3d.z);
			points1Proj.push_back(p3d);
			status[i] = 1;
		} else {
			status[i] = 0;
		}

	}
}
</pre>
<p>OK, now that I have (hopefully) triangulated 3D features from the initial state: 2 views of a planar scene with 5cm translation on the X axis &#8211; I can move on the pose estimation.</p>
<h2>Pose Estimation</h2>
<p>Theoretically, if I know the 3D position of features in the world and their respective 2D position in the image, it should be easy to recover the position of the camera, because there are a rotation matrix and translation vector that define this transformation. Practically in OpenCV, finding the position of an object using 3D-2D correlation is done by using the solvePnP() [<a href="http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html#solvepnp" target="_blank">link</a>] function.</p>
<p>Since I have an initial guess of the rotation and translation &#8211; from the first 2 frames &#8211; I can &#8220;help&#8221; the function estimate the new ones.</p>
<pre class="brush: plain; title: ; notranslate">
void findExtrinsics(vector&lt;Point2d&gt;&amp; points, vector&lt;double&gt;&amp; rv, vector&lt;double&gt;&amp; tv) {
	//estimate extrinsics for these points

	Mat rvec(rv),tvec(tv);

//initial &quot;guess&quot;, in case it wasn't already supplied
	if(rv.size()!=3) {
		rv = vector&lt;double&gt;(3);
		rvec = Mat(rv);
		double _d[9] = {1,0,0,
						0,-1,0,
						0,0,-1};
		Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
	}
	if(tv.size()!=3) {
		tv = vector&lt;double&gt;(3);
		tv[0]=0;tv[1]=0;tv[2]=0;
		tvec = Mat(tv);
	}

	//create a float rep  of points
	vector&lt;Point2f&gt; v2(points.size());
	Mat tmpOut(v2);
	Mat _tmpOut(points);
	_tmpOut.convertTo(tmpOut,CV_32FC2);

	solvePnP(points1projMF,tmpOut,camera_matrix,distortion_coefficients,rvec,tvec,true);

	printf(&quot;frame extrinsic:\nrvec: %.3f %.3f %.3f\ntvec: %.3f %.3f %.3f\n&quot;,rv[0],rv[1],rv[2],tv[0],tv[1],tv[2]);

//the output of the function is a Rodrigues form of rotation, so convert to regular rot-matrix
	Mat rotM(3,3,CV_64FC1); ///,_r);
	Rodrigues(rvec,rotM);
	double* _r = rotM.ptr&lt;double&gt;();
	printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
		_r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
}
</pre>
<p>After getting the extrinsic parameters of the camera, the next step is plugging in the visualization!</p>
<h2>Integrating OpenGL</h2>
<p>Generally, it should be possible to create a 3D scene that matches exactly the true world scene, where the triangulated features appear in the scene aligned exactly with the world. I was not able to achieve that, but I got pretty close:<br />
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>It&#8217;s basically what you do in augmented reality, you align the virtual camera&#8217;s position and rotation with the results you get from the vision part of the system. In the pose estimation we ended with a 3D rotation vector (Rodrigues form) and 3D translation vector which is used as-is, so only the rotation vector should be converted to 3&#215;3 matrix using the Rodrigues() function.</p>
<p>This is the OpenGL glut display() function that draws the scene:</p>
<pre class="brush: plain; title: ; notranslate">
void display(void)
{
	glClearColor(1.0f, 1.0f, 1.0f, 0.5f);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);	// Clear Screen And Depth Buffer

	//draw the background - the frame from the camers
	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	gluOrtho2D(0.0,352.0,288.0,0.0);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glDisable(GL_DEPTH_TEST);
	glDrawPixels(352,288,GL_RGB,GL_UNSIGNED_BYTE,backPxls.data);
	glEnable(GL_DEPTH_TEST);
	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();

    const double t = glutGet(GLUT_ELAPSED_TIME) / 1000.0;
	a = t*20.0;

	glMatrixMode(GL_MODELVIEW);
	glLoadIdentity();

//use the camera position 3D vector
	curCam[0] = cam[0]; curCam[1] = cam[1]; curCam[2] = cam[2];
//there seems to be some kind of offset...
	glTranslated(-curCam[0]+0.5,-curCam[1]+0.7,-curCam[2]);

//and the 3x3 rotation matrix
	double _d[16] = {	rot[0],rot[1],rot[2],0,
						rot[3],rot[4],rot[5],0,
						rot[6],rot[7],rot[8],0,
						0,	   0,	  0		,1};
	glMultMatrixd(_d);

//flip the rotation on the x-axis
	glRotated(180,1,0,0);

	//draw the 3D feature points
	glPushMatrix();
	glColor4d(1.0,0.0,0.0,1.0);
	for(unsigned int i=0;i&lt;points1Proj.size();i++) {
		glPushMatrix();
glTranslated(points1Proj[i].x,points1Proj[i].y,points1Proj[i].z);
		glutSolidSphere(0.03,15,15);
		glPopMatrix();
	}
	glPopMatrix();

	glutSwapBuffers();

	if(!running) {
		glutLeaveMainLoop();
	}

	Sleep(25);
}
</pre>
<p>This pretty much coveres my work, in a very concise way. The complete source code will reveal all I have done, and will provide a better copy-and-paste ground for your own projects.</p>
<h2>Things not covered in this work</h2>
<p>Initially I tried to implement a very crucial part of the PTAM work &#8211; pairing the 3D map with 2D features in the image. This allows them to re-align the map in every frame (when the tracking is bad) so the pose estimation does not &#8220;loose grip&#8221;. In essence, they keep a visual identity for each map feature, very similar to a descriptor like SURF or SIFT, so at any point they can find where in the new image are the features and recover the camera pose from the 2D-3D correspondence. I ran into a problem utilizing OpenCV&#8217;s SURF functionality, it seems to have a bug when trying to compute the descriptor for user-given feature points.</p>
<p>Another thing I chose not to implement is creating a full map of the surroundings. I wanted to achieve a simple working solution for a small map (essentially a single frame), and see how it works. In the original work by K&amp;M they constantly add more and more features to the map untill it has covered the whole surrounding room.</p>
<h2>Code and Working the Program</h2>
<p>As usual my code is available for checkout from the blog&#8217;s SNV repo:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/ptam ptam
</pre>
<p>To get the stereo initialization you must press [spacebar] twice: Once when the camera has stabilized and the features are stable, and another time when the camera has translated and again stabilized.<br />
This marks the 2 keyframes that will be used for stereo init and triangulation.<br />
From that point on, the 3D scene will start and the track-and-estimate stage begins. Try not to move the camera violently as the optical flow may suffer.</p>
<p>Thanks Lior for your help getting the hang of these subjects, and the opportunity to meddle with a subject I long gone wanted to explore.</p>
<p>I hope everyone will enjoy and learn from my enjoyment and learning.</p>
<p>Bye!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F06%2Fimplementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code%2F&amp;title=Implementing%20PTAM%3A%20stereo%2C%20tracking%20and%20pose%20estimation%20for%20AR%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_20"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

