<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>More Than Technical &#187; opengl</title>
	<atom:link href="http://www.morethantechnical.com/category/opengl/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.morethantechnical.com</link>
	<description>On software, code, the internet and more.</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:48:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Structure from Motion and 3D reconstruction on the easy in OpenCV 2.3+ [w/ code]</title>
		<link>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/</link>
		<comments>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 23:48:17 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[fundamental]]></category>
		<category><![CDATA[matrix]]></category>
		<category><![CDATA[motion]]></category>
		<category><![CDATA[reconstruction]]></category>
		<category><![CDATA[sfm]]></category>
		<category><![CDATA[structure]]></category>
		<category><![CDATA[triangulation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=998</guid>
		<description><![CDATA[Hello This time I&#8217;ll discuss a basic implementation of a Structure from Motion method, following the steps Hartley and Zisserman show in &#8220;The Bible&#8221; book: &#8220;Multiple View Geometry&#8221;. I will show how simply their linear method can be implemented in OpenCV. I treat this as a kind of tutorial, or a toy example, of how [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.44.42-PM.png" rel="lightbox[998]"><img class="alignleft size-medium wp-image-1064" title="SfM toy example" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.44.42-PM-300x71.png" alt="" width="300" height="71" /></a>Hello<br />
This time I&#8217;ll discuss a basic implementation of a Structure from Motion method, following the steps Hartley and Zisserman show in &#8220;The Bible&#8221; book: &#8220;Multiple View Geometry&#8221;. I will show how simply their linear method can be implemented in OpenCV.</p>
<p>I treat this as a kind of tutorial, or a toy example, of how to perform Structure from Motion in OpenCV.</p>
<p>Let&#8217;s get down to business&#8230;<br />
<span id="more-998"></span></p>
<h2>Getting a motion map</h2>
<p>The basic thing when doing reconstruction from pairs of images, is that you know the motion: How much &#8220;a pixel has moved&#8221; from one image to the other. This gives you the ability to reconstruct it&#8217;s distance from the camera(s). So our first goal is to try and understand that from a pair of two images.</p>
<p>In calibrated horizontal stereo rigs this is called <em>Disparity</em>, and it refers to the horizontal motion of a pixel. And OpenCV actually has some very good tools to recover horizontal disparity, that can be seen in this <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/stereo_match.cpp" target="_blank">sample</a>.</p>
<p>But in our case we don&#8217;t have a calibrated rig as we are doing monocular (one camera) depth reconstruction, or in other words: <em>Structure from motion</em>.</p>
<p>You can go about getting a motion map in many different ways, but two canonical ways are: optical flow and feature matching.<br />
Also, I will stick to what OpenCV has to offer, but obviously there is a whole lot of work.</p>
<div id="attachment_1051" class="wp-caption aligncenter" style="width: 562px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.31-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1051 " title="Input Images" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.31-AM.png" alt="" width="552" height="209" /></a><p class="wp-caption-text">Input pair of images, rotation and translation is unknown</p></div>
<h4>Optical Flow</h4>
<p>In optical flow you basically try to &#8220;track the pixels&#8221; from image 1 to 2, usually assuming a pixel can move only within a certain <em>window</em> in which you will search. OpenCV offers some ways to do optical flow, but I will focus on the newer and nicer one: Farenback&#8217;s method for dense optical flow.<br />
The word <em>dense</em> means we look for the motion for <em>every pixel in the image</em>. This is usually costly, but Farneback&#8217;s method is linear which is easy to solve, and they have a rocking implementation of it in OpenCV so it basically flies.<br />
Running the function on two images will provide a motion map, however my experiments show that this map is wrong in a fair bit of the times&#8230; To cope with that, I am doing an iterative operation, also leveraging the fact the this OF method can use an initial guess.<br />
An example of using Farneback method exists in the samples directory of OpenCV&#8217;s repo: <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/fback.cpp" target="_blank">here</a>.</p>
<div id="attachment_1050" class="wp-caption aligncenter" style="width: 334px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.04-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1050" title="Screen shot 2012-02-05 at 12.52.04 AM" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.52.04-AM.png" alt="" width="324" height="266" /></a><p class="wp-caption-text">Dense O-F using Farneback</p></div>
<h4>Feature Matching</h4>
<p>The other way of getting motion is matching features between the two images.<br />
In each image we extract salient features and invariant descriptors, and then match the two sets of features.<br />
It&#8217;s very easily done in OpenCV and widely covered by <a href="https://code.ros.org/svn/opencv/trunk/opencv/samples/cpp/matcher_simple.cpp" target="_blank">examples</a> and <a href="http://opencv.itseez.com/doc/tutorials/features2d/table_of_content_features2d/table_of_content_features2d.html" target="_blank">tutorials</a>.<br />
This method however, will not provide a dense motion map. It will provide a very sparse one at best&#8230; so that depth reconstruction will also be sparse. We may talk about how to overcome that by hacking some segmentation methods, like superpixels and graph-cuts, in a different post.</p>
<div id="attachment_1049" class="wp-caption aligncenter" style="width: 653px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.34-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1049" title="Screen shot 2012-02-05 at 12.51.34 AM" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.34-AM.png" alt="" width="643" height="264" /></a><p class="wp-caption-text">SURF features matching, with Fundamental matrix pruning via RANSAC</p></div>
<h4>A hybrid method</h4>
<p>Another way that I am working on to get motion is a hybrid between Feature Matching and Optical Flow.<br />
Basically the idea is to perform feature matching at first, and then O-F. When the motion is big, and features move quite a lot in the image, O-F sometimes fails (because pixel movement is usually confined to a search window).<br />
After we get features pairs, we can try to recover a global movement in the image. We use that movement as an initial guess for O-F.</p>
<div id="attachment_1052" class="wp-caption aligncenter" style="width: 333px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.50-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1052" title="Rigid transform flow" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-12.51.50-AM.png" alt="" width="323" height="265" /></a><p class="wp-caption-text">The rigid transform flow recovered from sparse feature matching</p></div>
<h2>Estimating Motion</h2>
<p>Once we have a motion map between the two images, it should pose no problem to recover the motion of the camera. The motion is described in the 3&#215;4 matrix P, which is combined of two elements: P = [R|t], which are the Rotational element R and Translational element t.<br />
H&amp;Z give us a bunch of ways of recovering the P matrices for both cameras in Chapter 9 of their book. The central method being &#8211; using the <a href="http://en.wikipedia.org/wiki/Fundamental_matrix_(computer_vision)" target="_blank">Fundamental Matrix</a>. This special 3&#215;3 matrix encodes the epipolar constraint between the images, to put simply: for each point x in image 1 and corresponding point x&#8217; in image 2 the following equation holds: x&#8217;Fx = 0.<br />
How does that help us? Well H&amp;Z also prove that if you have F, you can infer the two P matrices. And, if you have (sufficient) point matches between images, which we have, you can find F! Hurray!<br />
This is simply visible in the linear sense. F has 9 entries (but only 8 degrees of freedom), so if we have enough point pairs, we can solve for F in a least squares sense. But&#8230; F is better estimated in a more robust way, and OpenCV takes care of all of this for us in the function <a href="http://opencv.itseez.com/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html?highlight=findfundamentalmat#findfundamentalmat" target="_blank">findFundamentalMat</a>. There are several methods for recovering F there, linear and non-linear.<br />
However, H&amp;Z also point to a problem with using F right away &#8211; projective ambiguity. This means that the recovered camera matrices may not be the &#8220;real&#8221; ones, but instead have gone through some 3D projective transformation. To cope with this, we will use the <a href="http://en.wikipedia.org/wiki/Essential_matrix" target="_blank">Essential Matrix</a> instead, which is sort of the same thing (holds epiploar constraint over points) but for calibrated cameras. Using the Essential matrix removes the projective ambiguity and provides a Metric (or Singular) Reconstruction, which means the 3D points are true up to scaling alone, and not up to a projective transformation.</p>
<pre class="brush: plain; title: ; notranslate">
cv::FileStorage fs;
fs.open(&quot;camera_calibration.yml&quot;,cv::FileStorage::READ);
fs[&quot;camera_matrix&quot;]&gt;&gt;K;

Mat F = findFundamentalMat(imgpts1, imgpts2, FM_RANSAC, 0.1, 0.99, status);
Mat E = K.t() * F * K; //according to HZ (9.12)
</pre>
<p>Now let&#8217;s assume one camera is P = [I|0], meaning it hasn&#8217;t moved or rotated, getting the second camera matrix, P&#8217; = [R|t], is done as follows:</p>
<pre class="brush: plain; title: ; notranslate">
SVD svd(E);
Matx33d W(0,-1,0,	//HZ 9.13
	  1,0,0,
	  0,0,1);
Matx33d Winv(0,1,0,
	 -1,0,0,
	 0,0,1);
Mat_ R = svd.u * Mat(W) * svd.vt; //HZ 9.19
Mat_ t = svd.u.col(2); //u3
P1 = Matx34d(R(0,0),	R(0,1),	R(0,2),	t(0),
		 R(1,0),	R(1,1),	R(1,2),	t(1),
		 R(2,0),	R(2,1),	R(2,2), t(2));
</pre>
<p>Looks good, now let&#8217;s move on to reconstruction.</p>
<h2>Reconstruction via Triangulation</h2>
<p>Once we have two camera matrices, P and P&#8217;, we can recover the 3D structure of the scene. This can be seen simply if we think about it using ray intersection. We have two points in space of the camera centers (one in 0,0,0 and one in t), and we have the location in space of a point both on the image plane of image 1 and on the image plane of image 2. If we simply shoot a ray from from one camera center through the respective point and another ray from the other camera &#8211; the intersection of the two rays must be the real location of the object in space.<br />
In real life, none of that works. The rays usually will not intersect (so H&amp;Z refer to the mid-point algorithm, which they dismiss as a bad choice), and ray intersection in general is inferior to other triangulation methods.<br />
H&amp;Z go on to describe their &#8220;optimal&#8221; triangulation method, which optimizes the solution based on the error from reprojection of the points back to the image plane.<br />
I have implemented the linear triangulation methods they present, and wrote a post about it not long ago: <a title="http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/" href="http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/" target="_blank">Here</a>.<br />
I also added the Iterative Least Squares method that Hartley presented in his article &#8220;<a href="http://users.cecs.anu.edu.au/~hartley/Papers/triangulation/triangulation.pdf" target="_blank">Triangulation</a>&#8220;, which is said to perform very good and very fast.</p>
<div id="attachment_1056" class="wp-caption aligncenter" style="width: 333px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.22.59-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1056" title="depth map" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.22.59-AM.png" alt="" width="323" height="263" /></a><p class="wp-caption-text">&quot;Depth Map&quot;</p></div>
<div id="attachment_1057" class="wp-caption aligncenter" style="width: 576px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.23.36-AM.png" rel="lightbox[998]"><img class="size-full wp-image-1057" title="reconstruction" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.23.36-AM.png" alt="" width="566" height="349" /></a><p class="wp-caption-text">3D reconstruction</p></div>
<p>A word of notice, many many times the reconstruction will fail because the Fundamental matrix came out wrong. The results will just look aweful, and nothing like a true reconstruction. To cope with this, you may want to insert a check that will make sure the two P matrices are not completely bogus (you could check for a reasonable rotation for example). If the P matrices, that are derived from the F matrix, are strange, then you can discard this F matrix and compute a new one.</p>
<div id="attachment_1063" class="wp-caption aligncenter" style="width: 330px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.42.02-PM.png" rel="lightbox[998]"><img class="size-full wp-image-1063" title="Bad reconstruction" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-06-at-6.42.02-PM.png" alt="" width="320" height="263" /></a><p class="wp-caption-text">Example of when things go bad...</p></div>
<h2>Toolbox and Framework</h2>
<p>I created a small toolbox of the various methods I spoke about in this post, and created a very simple UI. It basically allows you to load two images and then try the different methods on them and get the results.<br />
It&#8217;s using FLTK3 for the GUI, and PCL (VTK backend) for visualization of the result 3D point cloud.<br />
It also includes a few classes with a simple API that let&#8217;s you get the features matches, motion map, camera matrices from the motion, and finally the 3D point cloud.</p>
<div id="attachment_1055" class="wp-caption aligncenter" style="width: 525px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.10.47-AM.png" rel="lightbox[998]"><img class=" wp-image-1055  " title="SfM GUI" src="http://www.morethantechnical.com/wp-content/uploads/2012/02/Screen-shot-2012-02-05-at-1.10.47-AM.png" alt="" width="515" height="156" /></a><p class="wp-caption-text">FLTK GUI</p></div>
<h2>Code &amp; Where to go next</h2>
<p>The code, as usual, is up for grabs at github:</p>
<pre><a title="Github repo" href="https://github.com/royshil/SfM-Toy-Library" target="_blank">https://github.com/royshil/SfM-Toy-Library</a></pre>
<p>Now, that have a firm grasp of SfM <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  you can go on to visit the following projects, which implement a much more robust solution:</p>
<p><a title="http://phototour.cs.washington.edu/bundler/" href="http://phototour.cs.washington.edu/bundler/" target="_blank">http://phototour.cs.washington.edu/bundler/</a><br />
<a title="http://code.google.com/p/libmv/" href="http://code.google.com/p/libmv/" target="_blank">http://code.google.com/p/libmv/</a><br />
<a title="http://www.cs.washington.edu/homes/ccwu/vsfm/" href="http://www.cs.washington.edu/homes/ccwu/vsfm/" target="_blank">http://www.cs.washington.edu/homes/ccwu/vsfm/</a></p>
<p>And Wikipedia points to some interesting libraries and code as well: <a href="http://en.wikipedia.org/wiki/Structure_from_motion" target="_blank">http://en.wikipedia.org/wiki/Structure_from_motion</a></p>
<p>Enjoy!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2012%2F02%2F07%2Fstructure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code%2F&amp;title=Structure%20from%20Motion%20and%203D%20reconstruction%20on%20the%20easy%20in%20OpenCV%202.3%2B%20%5Bw%2F%20code%5D" id="wpa2a_2"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spherical harmonics face relighting using OpenCV, OpenGL [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 00:59:34 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[glsl]]></category>
		<category><![CDATA[harmonics]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[recoloring]]></category>
		<category><![CDATA[relighting]]></category>
		<category><![CDATA[shaders]]></category>
		<category><![CDATA[spherical]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=948</guid>
		<description><![CDATA[Implementing a face image relighting algorithm using spherical harmonics, based on a paper written by Wang et al (2007).]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-19-at-8.13.27-PM.png" rel="lightbox[948]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-19-at-8.13.27-PM-300x130.png" alt="" title="Spherical harmonics face relighting" width="300" height="130" class="alignleft size-medium wp-image-1015" /></a>Hi!<br />
I&#8217;ve been working on implementing a face image relighting algorithm using spherical harmonics, one of the most elegant methods I&#8217;ve seen lately.<br />
I start up by aligning a face model with OpenGL to automatically get the canonical face normals, which brushed up my knowledge of GLSL. Then I continue to estimating real faces &#8220;spharmonics&#8221;, and relighting.</p>
<p>Let&#8217;s start!<br />
<span id="more-948"></span></p>
<h2>Some mathematical background</h2>
<p>Don&#8217;t worry, it wont hurt. much.</p>
<p>So Spherical Harmonics, were invented to numerically express a whole bunch of things in physics like gravity and magnetic fields. But they also became very useful for computer graphics as they are perfect for modelling light falling on a spherical body.</p>
<h3>But what ARE those mysterious spherical harmonics? </h3>
<p>The way I see it, they are a series of &#8220;modes&#8221; or &#8220;eigenvectors&#8221; or &#8220;orthogonal components&#8221; of a base that spans the surface of a sphere.<br />
To put it simple, they describe the surface of a sphere in increasing finer grained portions. Much like a Fourier decomposition does to a function, there is the base and there are coefficients that when multiplied with the base they recover the function.</p>
<h3>How is that good for graphics? </h3>
<p>People have used spherical harmonics mostly to model lighting of spherical objects. When you know the coefficients that describe the lighting, you can change them to <i>Re-light</i> an object, or <i>De-light</i>, or transfer the lighting conditions of one scene to another. Very useful!</p>
<p>Some good researchers, Basri and Jacobs, back in 2001 have formulated the first 9 harmonics as a function of the surface normal. On this page Basri references all his work on the subject: <a href="http://www.wisdom.weizmann.ac.il/~ronen/index_files/harmonic.html" target="_blank">http://www.wisdom.weizmann.ac.il/~ronen/index_files/harmonic.html</a> </p>
<p>But I like to reference a work that&#8217;s easier to process than Basri&#8217;s, that is the work of Wang et al from 2007. These guys made the steps to use spherical harmonics easier to follow: <a href="http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf" title="http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf" target="_blank">http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf</a>.<br />
But their algorithm is quite advanced, as it solves not only for the harmonics&#8217; coefficients but also for the normals of the object in the image. They use some fancy optimization of an energy function over a graph, that I&#8217;m not going to discuss.<br />
But they did make the process of finding the spherical harmonics&#8217; coefficient very clear.</p>
<h4>The bottom line</h4>
<p>We should solve for a vector of 9 coefficients that describes the &#8220;lighting of the object&#8221; (a face in our case).<br />
Each coefficient will tell us how much that specific harmonic is strong or weak, or in other words how lit is that certain area of the object.</p>
<p>Wang and Basri show a very simple method of using simultaneous linear equations to solve for the lighting coefficients, it depends only on knowing the normal of the object&#8217;s surface at each pixel in the image.</p>
<h2>Getting the normals of a canonical face</h2>
<p>So to get the normals, I thought the best way is to use a canonical model of a face (some king of an average face), instead of trying to recover the normals from the image pixels.<br />
For that end, I used Rhino3D to model (very roughly) a shape that resembles a human face, starting from an elongated sphere.<br />
Now all that&#8217;s left is to align the model with the face to relight, and that will supply the normals.<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/snapshot00.png" rel="lightbox[948]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/snapshot00-300x224.png" alt="" title="rough model of a human face" width="300" height="224" class="alignleft size-medium wp-image-1011" /></a><br />
Cool. Then I built a small app that allows the user to move the model around until it&#8217;s aligned with the face image. I used <a href="http://www.fltk.org/" target="_blank">FLTK 3.0</a> to do it since they have a simple interface with OpenGL, they are cross platform, and lightweight.<br />
So I set up a scene where I have the image as the background, and the model is floating above it, half transparent so the user can find the right spot. I added functions for rotating the model, and extra stuff like turning the model opaque.</p>
<p style="text-align: center">
<iframe width="480" height="360" src="http://www.youtube.com/embed/wIwAX2UM64E" frameborder="0" allowfullscreen></iframe>
</p>
<p>To get the normal map I used a very simple GLSL shader, that simply colors the pixel with the value of the normal nX,nY,nZ -> R,G,B.<br />
This way the result image OpenGL renders is simply the normal map of the face model. I just grab it using glReadPixels.</p>
<h2>Estimating spherical harmonics</h2>
<p>So, after the model is aligned, we can assume we have the normals ready for us for each pixel in the image, and the intensity in each pixel is also known.<br />
The first step that Wang suggests, without knowledge of the real face albedo (the real color of every pixel without any lighting effect), is to get an approximation of the 9-vector of lighting coefficients by setting a constant albedo. Easy enough, we can set the albedo to the average color in the face.<br />
Then we can simply build a huge set of linear equations (huge as the number of pixels in the image), and solve an overdetermined system to get the 9 coefficients.</p>
<pre class="brush: plain; title: ; notranslate">
		Scalar albedo_constant = mean(face_img_hsv, smallFaceMask);

		//setup linear equation system, lighting coefficients (l) is unknown
		//I = p00 * Ht * l
		float p00 = (float)albedo_constant[2] / 255.0f;

		cout &lt;&lt; &quot;Build Ht(&quot;&lt;&lt;n&lt;&lt;&quot;,9)...&quot;;
		cout &lt;&lt; &quot;Build I(&quot;&lt;&lt;n&lt;&lt;&quot;,1)...&quot;;
		//build Ht and I
		Mat_&lt;float&gt; Ht(n,9);
		Mat_&lt;float&gt; I(n,1);
		int pos = 0;
		vector&lt;Mat_&lt;uchar&gt; &gt; face_img_chnls; split(face_img_hsv, face_img_chnls);
		for (int i=0; i&lt;normalMapFlat.rows; i++) {
			if (smallFaceMask(i) == 0) { //is this pixel on the face?
				continue;
			}
			Ht.row(pos) = p00 * calculateSphericalHarmonicsForNormal(normalMapFlat(i));
			I(pos,0) = face_img_chnls[2](i) / 255.0f; //get V from HSV of pixel [0,1]
			pos ++;
		}
		cout &lt;&lt; &quot;DONE&quot;  &lt;&lt; endl;

		cout &lt;&lt; &quot;Solve&quot; &lt;&lt;endl;
		solve(Ht, I, l, DECOMP_SVD);

		cout &lt;&lt; &quot;initial lighting coeffs: &quot;;
		for (int i=0; i&lt;l.rows; i++) {
			cout&lt;&lt;l.at&lt;float&gt;(i)&lt;&lt;&quot;,&quot;;
		}
</pre>
<p>Booyah! lighting coefficients.</p>
<p>But this is only the first step. Now we can get an approximation of the albedo as well, using the coefficients:</p>
<pre class="brush: plain; title: ; notranslate">
		Mat_&lt;Vec3b&gt; face_img_v3b = face_img;

		#pragma omp parallel for schedule(dynamic)
		for (int y=0; y&lt;face_img.rows; y++) {
			for (int x=0; x&lt;face_img.cols; x++) {
				if (face_mask(y,x) == 0) {
					albedo(y,x) = 0;
					continue;
				}
				Mat sph = calculateSphericalHarmonicsForNormal(normalMap(y,x));
				Mat_&lt;float&gt; sph_l = sph * l;
				float fsph_l = sph_l(0);

				for (int cn = 0; cn&lt;3; cn++) {
					float fimg = face_img_v3b(y,x)[cn] / 255.0f;
					albedo(y,x)[cn] = (fimg / fsph_l);
				}
			}
		}
</pre>
<p>Done.<br />
Now that we have an initial albedo, Wang suggests we compute the coefficients again to get a better approximation, and then the albedo again.<br />
I however ran into some problems trying to do the second iteration, and the results always came out too dark&#8230; But even with the first iteration you can see a very nice change.<br />
Look at the video from before, you can see the right side of the face, which is over-lit, was darkened and the left side was lit up.</p>
<h2>Code</h2>
<p>The code for spherical harmonics analysis of images is part of a bigger project I have been working on for some time. I also spoke of it in a <a href="http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/" target="_blank">previous post</a>.<br />
Anyway it&#8217;s up in GitHub: <a href="https://github.com/royshil/HeadReplacement/tree/master/HeadReplacement" target="_blank">https://github.com/royshil/HeadReplacement/tree/master/HeadReplacement</a><br />
You&#8217;re looking for 4 files:</p>
<ul>
<li>SpharmonicsUI.cpp
<li>SpharmonicsUI.h
<li>spherical_harmonics_analysis.cpp
<li>spherical_harmonics_analysis.h
</ul>
<p>You can use the CMakeLists.txt to compile, but here&#8217;s a CMakeLists.txt that should take you there in one piece (fingers crossed):</p>
<pre class="brush: plain; title: ; notranslate">
find_package(OpenCV REQUIRED)
find_package(OpenGL REQUIRED)
find_package(OpenMP REQUIRED)

######## Find and add GLEE ########
file(GLOB_RECURSE GLEE_PATH &quot;${CMAKE_SOURCE_DIR}/GLee.c&quot;)
if(GLEE_PATH STREQUAL GLEE_PATH-NOTFOUND)
	message(STATUS &quot;GLEE was not found&quot;)
else()
	list(LENGTH GLEE_PATH GLEE_PATH_LEN)
	if(GLEE_PATH_LEN GREATER 1)
		list(GET GLEE_PATH 1 GLEE_PATH)
	endif()
	file(RELATIVE_PATH GLEE_PATH ${CMAKE_SOURCE_DIR} ${GLEE_PATH})
	get_filename_component(GLEE_PATH ${GLEE_PATH} REALPATH)
	get_filename_component(GLEE_PATH ${GLEE_PATH} PATH)
	message(STATUS &quot;Found GLEE at ${GLEE_PATH}&quot;)
	add_library(GLEE ${GLEE_PATH}/GLee.c)
endif()

############ Find FLTK ############
if(NOT DEFINED FLTK_PATH)
	file(GLOB_RECURSE FLTK_PATH &quot;${CMAKE_SOURCE_DIR}/Widget.h&quot;)
	if(FLTK_PATH STREQUAL FLTK_PATH-NOTFOUND   OR   FLTK_PATH STREQUAL &quot;&quot;)
		message(STATUS &quot;FLTK was not found !!!!!&quot;)
	else()
		list(LENGTH FLTK_PATH FLTK_PATH_LEN)
		if(FLTK_PATH_LEN GREATER 1)
			list(GET FLTK_PATH 1 FLTK_PATH)
		endif()
		file(RELATIVE_PATH FLTK_PATH ${CMAKE_SOURCE_DIR} ${FLTK_PATH})
		get_filename_component(FLTK_PATH ${FLTK_PATH} REALPATH)
		get_filename_component(FLTK_PATH ${FLTK_PATH} PATH)
		message(STATUS &quot;Found FLTK at ${FLTK_PATH}&quot;)
	endif()
else()
	get_filename_component(FLTK_PATH ${FLTK_PATH} REALPATH)
	message(STATUS &quot;FLTK path set to ${FLTK_PATH}&quot;)
endif()
set(FLTK_INCLUDE_DIR ${FLTK_PATH}/include)
set(FLTK_LIB_DIR ${FLTK_PATH}/lib)

######## Relighting #######
include_directories(${FLTK_INCLUDE_DIR})
include_directories(${OpenGL_INCLUDE_DIRS})
include_directories(${GLEE_PATH})
add_library(VirtualSurgeon_Relighting
	../HeadReplacement/glm.cpp
	../HeadReplacement/spherical_harmonics_analysis.cpp
	../HeadReplacement/LaplacianBlending.cpp
	../HeadReplacement/SpharmonicsUI.cpp
	../HeadReplacement/OGL_OCV_common.cpp
	)
</pre>
<p>Note that I had to resort to some very dark magic to recover the location of FLTK and GLEE&#8230; But it&#8217;s a jungle out there.</p>
<p>The source of the photograph is: <a href="http://www.flickr.com/photos/roel1943/309048020/" target="_blank">http://www.flickr.com/photos/roel1943/309048020/</a><br />
It is released under Creative Commons 2.0 ShareAlike-Attribution. So all the results here are also CC-2.0-SA-A&#8230; <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Enjoy,<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Kinect browser plugin with FireBreath [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/12/02/a-kinect-browser-plugin-with-firebreath-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/12/02/a-kinect-browser-plugin-with-firebreath-w-code/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 14:17:58 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[browser]]></category>
		<category><![CDATA[kinect]]></category>
		<category><![CDATA[plugin]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=996</guid>
		<description><![CDATA[Hi, Just reporting on a small achievement, part of a big project: Creating a browser plugin to display the Kinect depth map on screen. The integration was fairly easy, which leads me to think that both FireBreath and OpenNI/Nite are pretty neat framework that are robust.. So let&#8217;s see how it&#8217;s done From a template [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-02-at-9.12.03-AM.png" rel="lightbox[996]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-02-at-9.12.03-AM-150x150.png" alt="" title="Screen shot 2011-12-02 at 9.12.03 AM" width="150" height="150" class="alignleft size-thumbnail wp-image-1006" /></a>Hi,<br />
Just reporting on a small achievement, part of a big project: Creating a browser plugin to display the Kinect depth map on screen.<br />
The integration was fairly easy, which leads me to think that both FireBreath and OpenNI/Nite are pretty neat framework that are robust..<br />
So let&#8217;s see how it&#8217;s done<br />
<span id="more-996"></span></p>
<h2>From a template FireBreath plugin to an OpenGL plugin</h2>
<p>FireBreath is kind of an amazing project. They aim to be able to write a single source that will create plugins for all browsers and all operating systems. A daunting feat by my book. But building a MacOS Safari/Firefox plugin using their framework proved very simple&#8230;<br />
So I started here: <a href="http://www.firebreath.org/display/documentation/Mac+Video+Tutorial">http://www.firebreath.org/display/documentation/Mac+Video+Tutorial</a><br />
It&#8217;s a video tutorial of how to create a plugin from template, build it, install it and run it. Follow their instructions and you&#8217;ll have your plugin ready in 10 minutes.<br />
The next step will be to make our plugin display an OpenGL scene, which is what OpenNI/NITE use to display their depth map. This was also easy, borrowing code from the <a href="http://www.firebreath.org/display/documentation/OpenGL+Plugin">FireBreath OpenGL example</a>.<br />
However I ended up with a smaller source since I threw away most of the stuff&#8230;</p>
<pre class="brush: plain; title: ; notranslate">
class tutorialpluginMac : public tutorialplugin {
public:
    tutorialpluginMac();
	~tutorialpluginMac();

    BEGIN_PLUGIN_EVENT_MAP()
	EVENTTYPE_CASE(FB::AttachedEvent, onWindowAttached, FB::PluginWindowMac)
	EVENTTYPE_CASE(FB::DetachedEvent, onWindowDetached, FB::PluginWindowMac)
	PLUGIN_EVENT_MAP_CASCADE(tutorialplugin)
    END_PLUGIN_EVENT_MAP()

    virtual bool onWindowAttached(FB::AttachedEvent *evt, FB::PluginWindowMac*);
    virtual bool onWindowDetached(FB::DetachedEvent *evt, FB::PluginWindowMac*);
protected:

private:
    void* m_layer;

};

void glutDisplay (void); //this is implemented in the NITE code

@interface MyCAOpenGLLayer : CAOpenGLLayer {
    GLfloat m_angle;
}
@end

@implementation MyCAOpenGLLayer

- (id) init {
    if ([super init]) {
        m_angle = 0;
    }
    return self;
}

- (void)drawInCGLContext:(CGLContextObj)ctx pixelFormat:(CGLPixelFormatObj)pf forLayerTime:(CFTimeInterval)t displayTime:(const CVTimeStamp *)ts {
    //m_angle += 1;
    GLsizei width = CGRectGetWidth([self bounds]), height = CGRectGetHeight([self bounds]);
    GLfloat halfWidth = width / 2, halfHeight = height / 2;

    glViewport(0, 0, width, height);

	glutDisplay(); //let NITE draw it's stuff

    [super drawInCGLContext:ctx pixelFormat:pf forLayerTime:t displayTime:ts];
}

@end

tutorialpluginMac::tutorialpluginMac() : m_layer(NULL) {}

tutorialpluginMac::~tutorialpluginMac()
{
    if (m_layer) {
        [(CALayer*)m_layer removeFromSuperlayer];
        [(CALayer*)m_layer release];
        m_layer = NULL;
    }
}

bool tutorialpluginMac::onWindowAttached(FB::AttachedEvent* evt, FB::PluginWindowMac* wnd)
{
	cout &lt;&lt; &quot;tutorialpluginMac::onWindowAttached&quot; &lt;&lt; endl;
    if (FB::PluginWindowMac::DrawingModelCoreAnimation == wnd-&gt;getDrawingModel() ||
		FB::PluginWindowMac::DrawingModelInvalidatingCoreAnimation == wnd-&gt;getDrawingModel())
	{
        cout &lt;&lt; &quot; Setup CAOpenGL drawing. &quot;&lt;&lt;endl;
        MyCAOpenGLLayer* layer = [MyCAOpenGLLayer new];
        layer.asynchronous = (FB::PluginWindowMac::DrawingModelInvalidatingCoreAnimation == wnd-&gt;getDrawingModel()) ? NO : YES;
        layer.autoresizingMask = kCALayerWidthSizable | kCALayerHeightSizable;
        layer.needsDisplayOnBoundsChange = YES;
        m_layer = layer;
        if (FB::PluginWindowMac::DrawingModelInvalidatingCoreAnimation == wnd-&gt;getDrawingModel())
            wnd-&gt;StartAutoInvalidate(1.0/30.0);
        [(CALayer*) wnd-&gt;getDrawingPrimitive() addSublayer:layer];
    }
    return tutorialplugin::onWindowAttached(evt,wnd);
}

bool tutorialpluginMac::onWindowDetached(FB::DetachedEvent* evt, FB::PluginWindowMac* wnd)
{
    return tutorialplugin::onWindowDetached(evt,wnd);
}
</pre>
<p>(You guys will have to fill in the gaps&#8230; includes, etc.)</p>
<p>This goes in a new file, a new subclass of the generic plugin, only for Mac. For windows, you should subclass again and create the OpenGL context using WIN32 API or equivalent.</p>
<p>CMakeLists.txt files are also affected. Check out the repo.</p>
<h2>NITE OpenGL rendering</h2>
<p>Now that the plugin will just draw whatever NITE is drawing, half the battle is done. So for the drawing code I took the simple NiPointViewer example from the NITE library (get it <a href="http://www.openni.org/">here</a>).<br />
But, since we have need no windows management in the OpenNI, again we can make everything more simple. I took the code exactly as it is, and changed really just a small bit.<br />
I added<br />
<code><br />
#undef USE_GLUT<br />
#undef USE_GLES<br />
</code>, which pretty much makes that code compile to a very lean code (without window management etc.).<br />
And I rescued the glOrtho call in glutDisplay()<br />
<code><br />
//#ifdef USE_GLUT<br />
	glOrtho(0, mode.nXRes, mode.nYRes, 0, -1.0, 1.0);<br />
#if defined(USE_GLES)<br />
	glOrthof(0, mode.nXRes, mode.nYRes, 0, -1.0, 1.0);<br />
#endif<br />
</code></p>
<p>But the rest is pretty much identical.</p>
<p>One more thing, we should start the Kinect driver and OpenNI stack from somewhere in the plugin loading steps. In the main.cpp file from NITE I changed the main() function to kinect_main().<br />
I did that by adding it here, in the generic plugin (not the Mac subclass because it should be called from all OSs):</p>
<pre class="brush: plain; title: ; notranslate">
bool tutorialplugin::onWindowAttached(FB::AttachedEvent *evt, FB::PluginWindow *)
{
    // The window is attached; act appropriately
	kinect_main(0, 0);
	cout &lt;&lt; &quot;tutorialplugin::onWindowAttached&quot; &lt;&lt; endl;
    return true;
}
</pre>
<p>It now will fire when a window is attached to the plugin. The OpenGL calls will start running after the OGL context is up and starts rendering in a loop.</p>
<h2>Source and stuff</h2>
<p>Get the source for the Kinect-FireBreath plugin at GitHub: <a href="https://github.com/royshil/KinectPlugin">https://github.com/royshil/KinectPlugin</a></p>
<p>This is how it looks:<br />
<a style="display:block;" href="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-02-at-9.12.03-AM.png" rel="lightbox[996]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-02-at-9.12.03-AM.png" alt="" title="Screen shot 2011-12-02 at 9.12.03 AM" width="341" height="462" class="alignleft size-full wp-image-1006" /></a></p>
<p>Cool.<br />
Roy</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F12%2F02%2Fa-kinect-browser-plugin-with-firebreath-w-code%2F&amp;title=A%20Kinect%20browser%20plugin%20with%20FireBreath%20%5Bw%2F%20code%5D" id="wpa2a_4"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/12/02/a-kinect-browser-plugin-with-firebreath-w-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Identity Transfer in Photographs</title>
		<link>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/</link>
		<comments>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 05:29:58 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[head]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[photographs]]></category>
		<category><![CDATA[replacement]]></category>
		<category><![CDATA[survey]]></category>
		<category><![CDATA[transfer]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=1000</guid>
		<description><![CDATA[Hi! I would like to present something I have been working on recently, a work that immensely affect what I wrote in the blog in the past two years&#8230; To use it: Go on this page, Watch the short instruction video, download the application (MacOSX-Intel-x64 Win32) and make yourself a model! It takes just a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/male_model.jpg" rel="lightbox[1000]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/male_model-150x150.jpg" alt="" title="male_model" width="150" height="150" class="alignleft size-thumbnail wp-image-1001" /></a>Hi!</p>
<p>I would like to present something I have been working on recently, a work that immensely affect what I wrote in the blog in the past two years&#8230;</p>
<p>To use it:<br />
Go on this <a href="http://palimpost.xvm.mit.edu/HeadReplacement/default.html">page</a>,<br />
Watch the short <a href="http://youtu.be/YhHb3FAqaUk">instruction video</a>,<br />
download the application (<a href="http://palimpost.xvm.mit.edu/HeadReplacement/bin/HeadReplacement.dmg">MacOSX-Intel-x64</a> <a href="http://palimpost.xvm.mit.edu/HeadReplacement/bin/HeadReplacement_win32.zip">Win32</a>)<br />
and make yourself a model!<br />
It takes just a couple of minutes and it&#8217;s very simple&#8230;</p>
<p>This work is an academic research project, Please please, take the time to fill out the <a href="https://docs.google.com/spreadsheet/viewform?formkey=dGNBX0ljZXRVXzdtbjBQZ0dULTQwelE6MQ">survey</a>! It is very short..<br />
The results of the <a href="https://docs.google.com/spreadsheet/viewform?formkey=dGNBX0ljZXRVXzdtbjBQZ0dULTQwelE6MQ">survey</a> (the survey alone, no photos of your work) will possibly be published in an academic paper.</p>
<p>Note: No information is sent anywhere in any way outside of your machine (you may even unplug the network). All results are saved locally on your computer, and no inputs are recorded or transmitted. The application contains no malware. The source is available here.</p>
<p>Note II: All stock photos of models used in the application are released under Creative Commons By-NC-SA 2.0 license. Creator: http://www.flickr.com/photos/kk/. If you wish to distribute your results, they should also be released under a CC-By-NC-SA 2.0 license.</p>
<p>Thank you!<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F12%2F01%2Fidentity-transfer-in-photographs%2F&amp;title=Identity%20Transfer%20in%20Photographs" id="wpa2a_6"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A GLSL shader showing the normal map [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/10/30/a-glsl-shader-showing-the-normal-map-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/10/30/a-glsl-shader-showing-the-normal-map-w-code/#comments</comments>
		<pubDate>Sun, 30 Oct 2011 15:16:55 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Solutions]]></category>
		<category><![CDATA[fragment]]></category>
		<category><![CDATA[glsl]]></category>
		<category><![CDATA[normal]]></category>
		<category><![CDATA[shader]]></category>
		<category><![CDATA[vertex]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=958</guid>
		<description><![CDATA[A very simple thing, although I couldn&#8217;t find on Google some place to copy-paste off, so here it is: Vertex shader Fragment shader A technique to load the shaders that will save you a lot of headaches I based it on this example from NeHe. It does periodical error checking so you can see if [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/10/Screen-shot-2011-10-30-at-11.13.48-AM.png" rel="lightbox[958]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/10/Screen-shot-2011-10-30-at-11.13.48-AM-150x150.png" alt="" title="Screen shot 2011-10-30 at 11.13.48 AM" width="150" height="150" class="alignleft size-thumbnail wp-image-963" /></a><a href="http://www.morethantechnical.com/wp-content/uploads/2011/10/Screen-shot-2011-10-30-at-11.13.54-AM.png" rel="lightbox[958]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/10/Screen-shot-2011-10-30-at-11.13.54-AM-150x150.png" alt="" title="Screen shot 2011-10-30 at 11.13.54 AM" width="150" height="150" class="alignleft size-thumbnail wp-image-964" /></a><br />
A very simple thing, although I couldn&#8217;t find on Google some place to copy-paste off, so here it is:<br />
<span id="more-958"></span></p>
<h3>Vertex shader</h3>
<pre class="brush: plain; title: ; notranslate">
varying vec3 normal;

void main()
{
    gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
    normal = gl_NormalMatrix * gl_Normal;
}
</pre>
<h3>Fragment shader</h3>
<pre class="brush: plain; title: ; notranslate">
varying vec3 normal;

void main()
{
    vec3 normal_normal = normalize(normal);
	gl_FragColor = vec4(normal_normal, 1.0);
}
</pre>
<h3>A technique to load the shaders that will save you a lot of headaches</h3>
<pre class="brush: plain; title: ; notranslate">
GLvoid* my_program;

//Error-checking function
void checkARBError(GLvoid* obj) {
	char infolog[1024] = {0}; int _written = 0;
	glGetInfoLogARB(obj, 1024, &amp;_written, infolog);
	if(_written&gt;0) {
		cerr &lt;&lt; infolog &lt;&lt; endl;
	}
}	

bool notIsAscii(int i) { return !isascii(i); }

void init_shaders() {
	const GLubyte* lang_ver = glGetString(GL_SHADING_LANGUAGE_VERSION);
	cout &lt;&lt;&quot;shading language version: &quot;&lt;&lt;(uchar*)lang_ver&lt;&lt;endl;

	const char * my_fragment_shader_source;
	const char * my_vertex_shader_source;

        //Reading shaders from files
	ifstream ifs(&quot;vshader.txt&quot;);
	ostringstream ss; ss &lt;&lt; ifs.rdbuf();
	ifstream ifs1(&quot;fshader.txt&quot;);
	ostringstream ss1; ss1 &lt;&lt; ifs1.rdbuf();
	ifs.close(); ifs1.close();

        //Cleaning up the strings...
	string _vertex = ss.str(); _vertex.erase(remove_if(_vertex.begin(), _vertex.end(), notIsAscii), _vertex.end());
	string _frag = ss1.str(); _frag.erase(remove_if(_frag.begin(), _frag.end(), notIsAscii), _frag.end());

	// Get Vertex And Fragment Shader Sources
	my_fragment_shader_source = _frag.c_str();
	my_vertex_shader_source = _vertex.c_str();

        //DEBUG - can remove
	cout &lt;&lt; &quot;vertex shader:&quot;&lt;&lt;endl&lt;&lt;my_vertex_shader_source&lt;&lt;endl;
	cout &lt;&lt; &quot;fragment shader:&quot;&lt;&lt;endl&lt;&lt;my_fragment_shader_source&lt;&lt;endl;

	GLvoid* my_vertex_shader;
	GLvoid* my_fragment_shader;

	// Create Shader And Program Objects
	my_program = glCreateProgramObjectARB();
	my_vertex_shader = glCreateShaderObjectARB(GL_VERTEX_SHADER_ARB);
	my_fragment_shader = glCreateShaderObjectARB(GL_FRAGMENT_SHADER_ARB);

	// Load Shader Sources
	glShaderSourceARB(my_vertex_shader, 1, &amp;my_vertex_shader_source, NULL);
	checkARBError(my_vertex_shader);
	glShaderSourceARB(my_fragment_shader, 1, &amp;my_fragment_shader_source, NULL);
	checkARBError(my_fragment_shader);

	// Compile The Shaders
	glCompileShaderARB(my_vertex_shader);
	checkARBError(my_vertex_shader);
	glCompileShaderARB(my_fragment_shader);
	checkARBError(my_fragment_shader);

	// Attach The Shader Objects To The Program Object
	glAttachObjectARB(my_program, my_vertex_shader);
	glAttachObjectARB(my_program, my_fragment_shader);
	checkARBError(my_program);

	// Link The Program Object
	glLinkProgramARB(my_program);
	checkARBError(my_program);

}
</pre>
<p>I based it on <a href="http://nehe.gamedev.net/article/glsl_an_introduction/25007/" target="_blank">this example</a> from NeHe.</p>
<p>It does periodical error checking so you can see if something is wrong, plus it will make sure the vertex shader and fragmetn shader are stripped of all non-ASCII characters.<br />
This way the compilation will not give you cryptic errors such as &#8220;ERROR: 0:1: &#8216;<&#8216; : syntax error syntax error&#8221;&#8230;</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F10%2F30%2Fa-glsl-shader-showing-the-normal-map-w-code%2F&amp;title=A%20GLSL%20shader%20showing%20the%20normal%20map%20%5Bw%2F%20code%5D" id="wpa2a_8"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/10/30/a-glsl-shader-showing-the-normal-map-w-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A motion parallax screen using Kinect [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/06/05/a-motion-parallax-screen-using-kinect-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/06/05/a-motion-parallax-screen-using-kinect-w-code/#comments</comments>
		<pubDate>Sun, 05 Jun 2011 04:54:38 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[head tracking]]></category>
		<category><![CDATA[kinect]]></category>
		<category><![CDATA[motion parallax]]></category>
		<category><![CDATA[projection]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=863</guid>
		<description><![CDATA[How to create a motion-parallax screen using Kinect head tracking. Code in C++, using OpenGL and OpenNI's skeleton model.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve seen some examples of people who build <a href="http://en.wikipedia.org/wiki/Motion_parallax">motion parallax</a> capable screens using Kinect, but as usual &#8211; they don&#8217;t share the code. Too bad.<br />
Well this is your chance to see how it&#8217;s done, and it&#8217;s fairly simple as well.<br />
<span id="more-863"></span><br />
Let&#8217;s start by getting the user&#8217;s head position. This is done using <a href="http://www.openni.org/">OpenNI</a>&#8216;s library, that provides a skeleton model and hence the head. I used the NiUserTracker sample code as a basis, and stripped out everything that is not needed.</p>
<p>The only things I was interested were the head position and hands positions so I created a struct to hold these, plus some things OpenNI need to get the positions. I did this so it could be run in a different thread, and this struct can be the shared memory:</p>
<pre class="brush: plain; title: ; notranslate">
struct openni_stuff {
	xn::DepthGenerator* dg;
	xn::UserGenerator* ug;
	xn::Context* ctx;
	XnSkeletonJointPosition* Head;
	XnSkeletonJointPosition* rh;
	XnSkeletonJointPosition* lh;
};
</pre>
<p>All these must be populated in the main before starting the thread</p>
<pre class="brush: plain; title: ; notranslate">
xn::Context g_Context;
xn::DepthGenerator g_DepthGenerator;
xn::UserGenerator g_UserGenerator;
XnSkeletonJointPosition Head;
XnSkeletonJointPosition lHand;
XnSkeletonJointPosition rHand;

int main(..) {
..
g_Context.Init();
g_Context.FindExistingNode(XN_NODE_TYPE_DEPTH, g_DepthGenerator);
g_Context.FindExistingNode(XN_NODE_TYPE_USER, g_UserGenerator);
..
g_UserGenerator.Create(g_Context);
..
g_Context.StartGeneratingAll();
..
DWORD threadid;
struct openni_stuff s;
s.ctx = &amp;g_Context;
s.dg = &amp;g_DepthGenerator;
s.ug = &amp;g_UserGenerator;
s.Head = &amp;Head;
s.rh = &amp;rHand;
s.lh = &amp;lHand;
CreateThread(
            NULL,                   // default security attributes
            0,                      // use default stack size
			MyThreadFunction,       // thread function name
            (LPVOID)(&amp;s),          // argument to thread function
            0,                      // use default creation flags
            &amp;threadid);   // returns the thread identifie

glutMainLoop();
</pre>
<p>This code is very abstracted, there are more things to do in order for it to work, you can see them in the code repo.<br />
But basically the new thread is the one getting the information off the OpenNI framework and keeps the head position and hands positions vectors updated. </p>
<pre class="brush: plain; title: ; notranslate">
DWORD WINAPI MyThreadFunction( LPVOID lpParam ) {
       //Unpack the struct, don't care for shallow copy since it's all pointers anyway
	struct openni_stuff s = *((struct openni_stuff*)lpParam);
	for(;;) {
		getOpenNIData(s);
		Sleep(30);
	}
	return 0;
}

void getOpenNIData (struct openni_stuff s)
{
	xn::SceneMetaData sceneMD;
	xn::DepthMetaData depthMD;
	s.dg-&gt;GetMetaData(depthMD);

	if (!g_bPause)
	{
		// Read next available data
		s.ctx-&gt;WaitAndUpdateAll();
	}

	// Process the data
	g_DepthGenerator.GetMetaData(depthMD);
	s.dg-&gt;GetMetaData(depthMD);
	rHand.position.X = NULL;
	s.ug-&gt;GetUserPixels(0, sceneMD);
	DrawDepthMap(depthMD, sceneMD, *s.Head, *s.rh, *s.lh);
}
</pre>
<p>I thought this will give a performance boost as the WaitAndUpdateAll() call usually takes a little while, but it didn&#8217;t matter much&#8230;</p>
<p>The OpenGL (GLUT) runs on the main thread, and just looks at these updated vectors for the current position.</p>
<h2>Off-Axis projection</h2>
<p>The concept of off-axis projection is very important for this project. This <a href="http://csc.lsu.edu/~kooima/pdfs/gen-perspective.pdf">very good article explains everything about generalized perspective projections</a>, it also includes C code!, I recommend reading it. But basically off-axis projection is when the viewing eye is not perpendicular the projection surface, nor it needs to be centered in relation to it. It&#8217;s what goes on in our human binocular vision, each eye looks at the same point but they are not perpendicular to the virtual projection surface (they are angled to it), and they both have an offset from the center. Just read that little paper&#8230;.</p>
<p>Anyway, cutting to the chase, we need to project the rendered objects in the scene onto the projection table, assuming the user is not looking at it perpendicularly (like they would with a normal screen). Thanks to the code in the aforementioned article &#8211; this is a breeze.</p>
<pre class="brush: plain; title: ; notranslate">
void subtract(float u[3], float v[3], float n[3]) {
	u[0] = v[0] - n[0];
	u[1] = v[1] - n[1];
	u[2] = v[2] - n[2];
}

void projection( float *pa,
				float *pb,
				float *pc,
				float *pe, float n, float f)
{
	float va[3], vb[3], vc[3];
	float vr[3], vu[3], vn[3];
	float l, r, b, t, d, M[16];

	// Compute an orthonormal basis for the screen.
	subtract(vr, pb, pa);
	subtract(vu, pc, pa);

	glmNormalize(vr);
	glmNormalize(vu);
	glmCross(vr, vu, vn);
	glmNormalize(vn);

	// Compute the screen corner vectors.
	subtract(va, pa, pe);
	subtract(vb, pb, pe);
	subtract(vc, pc, pe);

	// Find the distance from the eye to screen plane.
	d = -glmDot(va, vn);

	// Find the extent of the perpendicular projection.
	l = glmDot(vr, va) * n / d;
	r = glmDot(vr, vb) * n / d;
	b = glmDot(vu, va) * n / d;
	t = glmDot(vu, vc) * n / d;
	// Load the perpendicular projection.
	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	glLoadIdentity();
	glFrustum(l, r, b, t, n, f);
	// Rotate the projection to be non-perpendicular.
	memset(M, 0, 16 * sizeof (float));
	M[0] = vr[0]; M[4] = vr[1]; M[ 8] = vr[2];
	M[1] = vu[0]; M[5] = vu[1]; M[ 9] = vu[2];
	M[2] = vn[0]; M[6] = vn[1]; M[10] = vn[2];
	M[15] = 1.0f;
	glMultMatrixf(M);
	// Move the apex of the frustum to the origin.
	glTranslatef(-pe[0], -pe[1], -pe[2]);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
}
</pre>
<p>I am using <a href="http://www.xmission.com/~nate/tutors.html">glm.h &#038; glm.c from Nate Robbins</a> to do some basic lin-algebra. I just didn&#8217;t feel like re-writing the code, and I&#8217;m already using it to load Wavefront OBJ models. The only missing function is <code>subtract</code> which is included.</p>
<p>Loading the OBJ models is super easy with glm.h:</p>
<pre class="brush: plain; title: ; notranslate">
	   objmodel_ptr = glmReadOBJ(&quot;../bunny1.obj&quot;);
	   if (!objmodel_ptr)
		   exit(0);

	   glmUnitize(objmodel_ptr);
	   glmFacetNormals(objmodel_ptr);
	   glmVertexNormals(objmodel_ptr, 90.0);
</pre>
<p>Now that we can create off-axis views (this can be reused for other projects, such as projects with VR glasses!), I draw the scene after applying this projection:</p>
<pre class="brush: plain; title: ; notranslate">
GLfloat eye[4] = {0,200,1050,0}; //position of eye
double kinectHeight = 300;  //the Kinect is by the table, at a certain height (measured)
GLdouble tlv[3] = {-530, -kinectHeight, 90},   //top-left point of table in Kinect coordinates (millimeters)
		trv[3] = {530, -kinectHeight, 90}, //top-right
		brv[3] = {530, -kinectHeight, 955}, //bottom-right
		blv[3] = {-530, -kinectHeight, 955}; //bottom-left
GLdouble obj[3] = {-200, tlv[1], 522.5}; //the virtual object's real-world position (mm)

static void display(GLenum mode)
{
       //set the eye position
	if(Head.position.X != 0.0f || Head.position.Y != 0.0f || Head.position.Z != 0.0f)
	{
		eye[0] = Head.position.X;
		eye[1] = Head.position.Y;
		eye[2] = Head.position.Z;
	}

	glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT);
	offAxisView();
}

void offAxisView() {
	projection(blvf, brvf, tlvf, eye, 1.0f, 10000.0f);

	glLightfv(GL_LIGHT0, GL_POSITION, lightp);
	drawScene();

	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();
	glMatrixMode(GL_MODELVIEW);
}

void drawScene() {
	//Just draw an object..
	glPushMatrix();
	glTranslated(obj[0]-10,obj[1]+80,obj[2]); //translating to accomodate for obj size
	glColor4f(1.0, 0.0, 0.0, 1.0);
	glScaled(80,80,80);
	glmDraw(objmodel_ptr,GLM_SMOOTH);
	glPopMatrix();
}
</pre>
<p>You can see that I measured the position of the table in respect to the Kinect sensor&#8217;s center, we assume that it is the origin, and these are used for the off-axis projection w.r.t the eye.</p>
<p>That&#8217;s pretty much it&#8230; the program runs, you have to stand in the silly &#8220;Psi&#8221; position for the OpenNI framework to calibrate, and then the graphics will be rendered according to your head position.</p>
<p>To create your own setup, just put in the right position of the table in respect to the Kinect sensor in real-world coordinates (mm).</p>
<h2>Code</h2>
<p>Can be downloaded from SVN as usual:<br />
<code>svn co https://morethantechnical.googlecode.com/svn/trunk/kinect_motion_parallax/main.cpp</code></p>
<h2>Video</h2>
<p><iframe width="560" height="349" src="http://www.youtube.com/embed/qK4VNo9bI2U" frameborder="0" allowfullscreen></iframe></p>
<p>Enjoy<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/05/a-motion-parallax-screen-using-kinect-w-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>20-lines AR in OpenCV [w/code]</title>
		<link>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/#comments</comments>
		<pubDate>Wed, 10 Nov 2010 15:00:30 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[computer vision]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=732</guid>
		<description><![CDATA[Hi, Just wanted to share a bit of code using OpenCV&#8217;s camera extrinsic parameters recovery, camera position and rotation &#8211; solvePnP (or it&#8217;s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code. [...]]]></description>
			<content:encoded><![CDATA[<p>Hi,</p>
<p>Just wanted to share a bit of code using OpenCV&#8217;s camera extrinsic parameters recovery, camera position and rotation &#8211; solvePnP (or it&#8217;s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code.<br />
This can serve as a primer, or tutorial on how to use OpenCV with OpenGL for AR.</p>
<p>The program is just a straightforward optical flow based tracking, fed manually with four points which are the planar object&#8217;s corners, and solving camera-pose every frame. Plain vanilla AR.</p>
<p>Well the whole cpp file is ~350 lines, but there will only be 20 or less <strong>interesting</strong> lines&#8230; Actually much less. Let&#8217;s see what&#8217;s up<br />
<span id="more-732"></span><br />
I wanna run you through the code really quickly and not go into much detail, to keep thing simple. So first of all, we should have two separate threads: Vision and Graphics. The vision thread will track and solve, and the graphics thread will display. </p>
<h2>Initialize</h2>
<pre class="brush: plain; title: ; notranslate">
int main(int argc, char** argv) {
	initGL(argc,argv);
	initOCV(NULL);

	pthread_t tId;
	pthread_attr_t tAttr;
	pthread_attr_init(&amp;tAttr);
	pthread_create(&amp;tId, &amp;tAttr, startOCV, NULL);

	startGL(NULL);
}
</pre>
<p>The initGL, initOCV functions just initialize stuff that can&#8217;t be initialized statically, like GLUT window definitions, some starting values for the cam-pose estimation and other boring stuff. </p>
<p>GLUT will run off the main thread, it seems putting it on its own thread makes it unhappy and not work.</p>
<h2>Tracking</h2>
<p>I&#8217;m using the simplest form of optical flow in OpenCV (LK Pyramid), and the code is equally very minimal..</p>
<pre class="brush: plain; title: ; notranslate">
void* startOCV(void* arg) {
	while (1) {
		cvtColor(img, prev, CV_BGR2GRAY);

		//get frame off camera
		cap &gt;&gt; frame;
		if(frame.data == NULL) break;

		frame.copyTo(img);

		cvtColor(img, next, CV_BGR2GRAY);

		//calc optical flow
		calcOpticalFlowPyrLK(prev, next, points1, points2, status, err, Size(30,30));
		cvtPtoKpts(imgPointsOnPlane, points2);

		//switch points vectors (next becomes previous)
		points1.clear();
		points1 = points2;

		//calculate camera pose
		getPlanarSurface(points1);

		//refresh 3D scene
		glutPostWindowRedisplay(glutwin);

		//show tracked points on scene
		drawKeypoints(next, imgPointsOnPlane, img_to_show, Scalar(255));
		imshow(&quot;main2&quot;, img_to_show);
		int c = waitKey(30);
		if (c == ' ') {
			waitKey(0);
		}
	}
	return NULL;
}
</pre>
<p>To use OpenCV&#8217;s &#8216;drawKeypoints&#8217;, which makes drawing key points much easier, we must use <code>vector&lt;KeyPoint&gt;</code>. So I created these 2 very simple converter funcs: cvtKeyPtoP and cvtPtoKpts.</p>
<p>You think &#8216;getPlanarSurface&#8217; is complicated? think again! 3 lines:</p>
<pre class="brush: plain; title: ; notranslate">
void getPlanarSurface(vector&lt;Point2f&gt;&amp; imgP) {
	Rodrigues(rotM,rvec);

	solvePnP(objPM, Mat(imgP), camera_matrix, distortion_coefficients, rvec, tvec, true);

	Rodrigues(rvec,rotM);
}
</pre>
<p><strong>Booya</strong>! Vision stuff is done.</p>
<h2>3D Graphics</h2>
<p>A little 3D never hurt any AR system&#8230; But drawing it is very simple still:</p>
<pre class="brush: plain; title: ; notranslate">
void display(void)
{
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	//Make sure we have a background image buffer
	if(img_to_show.data != NULL) {
		Mat tmp; 

		//Switch to Ortho for drawing background
		glMatrixMode(GL_PROJECTION);
		glPushMatrix();
		gluOrtho2D(0.0, 0.0, 640.0, 480.0);

		glMatrixMode(GL_MODELVIEW);

		//Textures can only have power-of-two dimensions, so closest to 640x480 is 1024x512
		tmp = Mat(Size(1024,512),CV_8UC3);
		//However we are going to use only a portion, so create an ROI
		Mat ttmp = tmp(Range(0,img_to_show.rows),Range(0,img_to_show.cols));

		//Some frames could be 8bit grayscale, so make sure on the output we always get 24bit RGB.
		if(img_to_show.step == img_to_show.cols)
			cvtColor(img_to_show, ttmp, CV_GRAY2RGB);
		else if(img_to_show.step == img_to_show.cols * 3)
			cvtColor(img_to_show, ttmp, CV_BGR2RGB);
		flip(ttmp,ttmp,0);

		glEnable(GL_TEXTURE_2D);
		glTexImage2D(GL_TEXTURE_2D, 0, 3, 1024, 512, 0, GL_RGB, GL_UNSIGNED_BYTE, tmp.data);

		//Finally, draw the texture using a simple quad with texture coords in corners.
		glPushMatrix();
		glTranslated(-320.0, -240.0, -500.0);//why these parameters?!
		glBegin(GL_QUADS);
		glTexCoord2i(0, 0); glVertex2i(0, 0);
		glTexCoord2i(1, 0); glVertex2i(640, 0);
		glTexCoord2i(1, 1); glVertex2i(640, 480);
		glTexCoord2i(0, 1); glVertex2i(0, 480);
		glEnd();
		glPopMatrix();

		glMatrixMode(GL_PROJECTION);
		glPopMatrix();
		glMatrixMode(GL_MODELVIEW);
	}

	glPushMatrix();
	double m[16] = {	_d[0],-_d[3],-_d[6],0,
						_d[1],-_d[4],-_d[7],0,
						_d[2],-_d[5],-_d[8],0,
						tv[0],-tv[1],-tv[2],1	};

	//Rotate and translate according to result from solvePnP
	glLoadMatrixd(m);

	//Draw a basic cube
	glDisable(GL_TEXTURE_2D);
	glColor3b(255, 0, 0);
	glutSolidCube(1);
	glPopMatrix();

	glutSwapBuffers();
}
</pre>
<p>Not so horrific, huh? Most of it is drawing the background texture, and that&#8217;s only trying to avoid using glDrawPixels&#8230; The only interesting thing is loading the rotation and translation matrix.<br />
However you will notice the tv[0] (x axis component of translation) doesn&#8217;t have a minus sign, that&#8217;s because OpenCV&#8217;s solvePnP assumes looking down the -z axis, while OpenGL assumes looking up the +z axis (so a 180 rotation around the x axis is needed). Same goes for _d[0] _d[1] and _d[2].<br />
OpenGL in fact is defaulting to the camera looking down the -y axis, where the z axis is facing up (z is elevation). But in initGL I initialized OpenGL to look &#8220;normally&#8221; down the -z axis where +x goes right and +y goes up.</p>
<h2>Proof time</h2>
<p>Not that you need it.. <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But here&#8217;s a video of it working.<br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/OxBa_5HvZyI?hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/OxBa_5HvZyI?hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p>BTW: If anyone can solve the problem of the slight misalignment of the 3D and image &#8211; let me know.</p>
<h2>Code and Salutations</h2>
<p>Code can be downloaded from blog&#8217;s SVN:</p>
<pre class="brush: plain; title: ; notranslate">svn checkout http://morethantechnical.googlecode.com/svn/trunk/OpenCVAR morethantechnical-OpenCVAR</pre>
<p>Now let your imagination run wild!</p>
<p>Farewell,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F11%2F10%2F20-lines-ar-in-opencv-wcode%2F&amp;title=20-lines%20AR%20in%20OpenCV%20%5Bw%2Fcode%5D" id="wpa2a_10"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Quick and Easy Head Pose Estimation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 16:38:49 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[computer vision]]></category>
		<category><![CDATA[head pose]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=623</guid>
		<description><![CDATA[Hi Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods. I implemented a very quick &#38; dirty solution based [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" rel="lightbox[623]"><img class="alignleft size-full wp-image-624" style="border: 1px solid black; margin-right: 5px;" title="j5" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" alt="" width="350" height="175" /></a>Hi</p>
<p>Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent <a href="http://people.ict.usc.edu/~gratch/CSCI534/Head%20Pose%20estimation.pdf" target="_blank">overview of almost all known methods</a>.</p>
<p>I implemented a very quick &amp; dirty solution based on OpenCV&#8217;s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method &#8211; solvePnP &#8211; that does this, given some calibration parameters that I completely disregarded.</p>
<p>Here&#8217;s how it&#8217;s done</p>
<p><span id="more-623"></span></p>
<h2>Intro</h2>
<p>I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It&#8217;s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.</p>
<p>So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used <a href="http://meshlab.sourceforge.net/" target="_blank">MeshLab </a>to mark some points on the model:</p>
<ol>
<li>Left ear</li>
<li>Right ear</li>
<li>Left eye</li>
<li>Right eye</li>
<li>Nose tip</li>
<li>Left mouth corner</li>
<li>Right mouth corner</li>
</ol>
<p>Then I headed to <a href="http://vis-www.cs.umass.edu/lfw/" target="_blank">LFW database</a> to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina&#8217;s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.</p>
<h2>Time to Code</h2>
<p>First I initialize the 3D points vector, and a dummy camera matrix:</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;Point3f &gt; modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear

op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //&quot;calibration matrix&quot;: center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);
</pre>
<p>Even though the &#8220;calibration&#8221; parameters are totally bogus they work pretty good.</p>
<p>Now, we&#8217;re all ready to start estimating some poses. So let&#8217;s use solvePnP:</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;Point2f &gt; imagePoints;

//read 2D points from file...
FILE* f;
fopen_s(&amp;f,&quot;points.txt&quot;,&quot;r&quot;);
for(int i=0;i&lt;7;i++) {
     int x,y;
     fscanf_s(f,&quot;%d&quot;,&amp;x); fscanf_s(f,&quot;%d&quot;,&amp;y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);&lt;/td&gt;

//make a Mat of the vector&lt;&gt;
Mat ip(imagePoints);

//display points on image
Mat img = imread(&quot;image.png&quot;);
for(unsigned int i=0;i&lt;imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);

//&quot;distortion coefficients&quot;... hah!
double _dc[] = {0,0,0,0};

//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);

//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr&lt;double&gt;();
printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
</pre>
<p>Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:</p>
<pre class="brush: plain; title: ; notranslate">
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();

//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera

//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

glPopMatrix();
glPopMatrix();
//----------End axes --------------
</pre>
<p>That wasn&#8217;t too hard, huh? Awesome.</p>
<h2>So&#8230;. Results</h2>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<h2>Code</h2>
<p>You can grab the code from the SVN repo:</p>
<pre class="brush: plain; title: ; notranslate">

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose
</pre>
<p>Enjoy!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F19%2Fquick-and-easy-head-pose-estimation-with-opencv-w-code%2F&amp;title=Quick%20and%20Easy%20Head%20Pose%20Estimation%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_12"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Implementing PTAM: stereo, tracking and pose estimation for AR with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 16:53:11 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[3d]]></category>
		<category><![CDATA[augmented reality]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=606</guid>
		<description><![CDATA[Hi Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [PDF]. This is a work by George Klein [homepage] and David Murray from Oxford university, presented in ISMAR 2007. When I first saw it on [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" rel="lightbox[606]"><img class="alignleft size-full wp-image-617" title="ptam" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" alt="" width="350" height="286" /></a>Hi</p>
<p>Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [<a href="http://www.robots.ox.ac.uk/~gk/publications/KleinMurray2007ISMAR.pdf" target="_blank">PDF</a>]. This is a work by George Klein [<a href="http://www.robots.ox.ac.uk/~gk/" target="_blank">homepage</a>] and David Murray from Oxford university, presented in ISMAR 2007.</p>
<p>When I first saw it on youtube [<a href="http://www.youtube.com/watch?v=pBI5HwitBX4" target="_blank">link</a>] I immediately saw the immense potential &#8211; mobile markerless augmented reality. I thought I should get to know this work a bit more closely, so I chose to implement it as a part of advanced computer vision course, given by Dr. Lior Wolf [<a href="http://www.cs.tau.ac.il/~wolf/" target="_blank">link</a>] at TAU.</p>
<p>The work is very extensive, and clearly is a result of deep research in the field, so I set to achieve a few selected features: Stereo initialization, Tracking, and small map upkeeping. I chose not to implement relocalization and full map handling.</p>
<p>This post is kind of a tutorial for 3D reconstruction with OpenCV 2.0. I will show practical use of the functions in cvtriangulation.cpp, which are not documented and in fact incomplete. Furthermore I&#8217;ll show how to easily combine OpenCV and OpenGL for 3D augmentations, a thing which is only briefly described in the docs or online.</p>
<p>Here are the step I took and things I learned in the process of implementing the work.</p>
<p>Update: A nice patch by yazor fixes the video mismatching &#8211; thanks! and also a nice application by Zentium called &#8220;iKat&#8221; is doing some kick-ass <a href="http://gizmodo.com/5489946/ikat-augmented-reality-app-works-without-real+world-prompt">mobile markerless augmented reality</a>.<br />
<span id="more-606"></span></p>
<h2>Preparations&#8230;</h2>
<p>Before going straight to coding, I had to prepare a few things.</p>
<ul>
<li>A working compilation of OpenCV &#8211; not trivial with the new version 2.0.</li>
<li>A calibrated camera.</li>
<li>Test data</li>
</ul>
<p>Compiling OpenCV 2.0 proved to be a bit tricky. Even though the sourceforge project offers binary release for Win32, I compiled the whole thing from source. It turned out the binary release doesn&#8217;t contain .lib files, and anyway has compatibility issues between MS VS 2005 and 2008 &#8211; something about the embedded manifest [<a href="http://www.google.com/search?q=opencv+2.0+VS+2008+manifest+erro" target="_blank">google</a>]. I downloaded the freshest source from SVN, and compiled it, but it didn&#8217;t solve the debug-release problem, so I was left with using the release dlls even for debug evironment.</p>
<p>Initially I thought I&#8217;ll try an uncalibrated camera approach, but soon abandoned it. I had to calibrate my cameras, which I did  very easily using OpenCV&#8217;s &#8220;calibration.cpp&#8221;, which strangely is <strong>not built</strong> when building all examples &#8211; it has to be built manually. But everything went smoothly, and I soon got a calibration matrix (focal length, center of projection) and radial distortion coefficients.</p>
<h3>Getting Test Data</h3>
<p><object style="width: 480px; height: 295px;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="295" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" /><param name="align" value="left" /><embed style="width: 480px; height: 295px;" type="application/x-shockwave-flash" width="480" height="295" src="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" align="left"></embed></object>For the test data I wanted to get a few views of a planar scene, where the first two views are separated only by a translation of ~5cm, as K&amp;M do in the PTAM article. This known translation is helpful when trying to triangulate the initial features in the scene. When you have prior knowledge of where the cameras are, you can simply intersect the epipolar lines between the two views and recover the 3D position of the points &#8211; up to a scale. Keep in mind you must also have feature correspondence: a point on image A must be correlated to a point in image B.</p>
<p>To achieve this I set up a small program that uses Optical Flow to track some 2D features in the scene, and grab a few screens + feature vectors. See &#8216;capture_data.cpp&#8217;.</p>
<h2>Stereo Initialization</h2>
<p>Now that I have 2 views with feature correspodence:</p>
<p><a rel="lightbox" href="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png"><img class="alignnone size-full wp-image-607" title="frames_correl" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png" alt="" width="634" height="259" /></a></p>
<p>I would like to triangulate the features. This is possible, as I discussed earlier, since I know the rotation (none), translation (5cm on -x axis) and camera calibration parameters (focal length, center of projection).</p>
<h3>Triangulation</h3>
<p>For triangulation, OpenCV has only recently added a couple of functions that implement triangulation [<a href="http://n2.nabble.com/An-implementation-of-the-Optimal-Triangulation-Method-td2295331.html" target="_blank">link</a>] as shown by Hartly &amp; Zisserman [<a href="http://users.cecs.anu.edu.au/~hartley/Papers/CVPR99-tutorial/tut_4up.pdf" target="_blank">PDF</a>, page 12]. However, these functions are not formally documented, and in fact they are missing some important parts. This is how I used cvTriangulation(), which is the key function:</p>
<pre class="brush: plain; title: ; notranslate">
//this function will initialize the 3D features from two views
void stereoInit() {

//first load camera intrinsic parameters
FileStorage fs(&quot;cam_work.out&quot;,CV_STORAGE_READ);
FileNode fn = fs[&quot;camera_matrix&quot;];
camera_matrix = Mat((CvMat*)fn.readObj(),true);

fn = fs[&quot;distortion_coefficients&quot;];
distortion_coefficients = Mat((CvMat*)fn.readObj(),true);

//vector&lt;Point2d&gt; points[2]; //these Point2d vectors hold the 2D features, double precision, from the 2 views

//get copy of points
_points[0] = points[0];
_points[1] = points[1];
Mat pts1M(_points[0]), pts2M(_points[1]); //very easy in OpenCV 2.0 to convert vector&lt;&gt; to Mat.

//Undistort points
Mat tmp,tmpOut;
pts1M.convertTo(tmp,CV_32FC2);  //undistort takes only floats not doubles, so convert to Point2f
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts1M,CV_64FC2);  //go back to double precision

pts2M.convertTo(tmp,CV_32FC2);
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts2M,CV_64FC2);

vector&lt;uchar&gt; tri_status; //this will hold the status for each point, a good point will have 1, bad - 0

//now triangulate
triangulate(_points[0],_points[1],tri_status);

}

void triangulate(vector&lt;Point2d&gt;&amp; points1, vector&lt;Point2d&gt;&amp; points2, vector&lt;uchar&gt;&amp; status) {

	//Convert points to 1-channel, 2-rows, double precision - This is important - see the code
...

	Mat ___tmp(2,pts1Mt.cols,CV_64FC1,__d);
...
	Mat ___tmp1(2,pts2Mt.cols,CV_64FC1,__d1);
...

	CvMat __points1 = ___tmp, __points2 = ___tmp1;

	//projection matrices
	double P1d[12] = {	-1,0,0,0,
						0,1,0,0,
						0,0,1,0 };	//Identity, but looking into -z axis
	Mat P1m(3,4,CV_64FC1,P1d);
	CvMat* P1 = &amp;(CvMat)P1m;
	double P2d[12] = {	-1,0,0,-5,
						0,1,0,0,
						0,0,1,0 };  //Identity rotation, 5cm -x translation, looking into -z axis
	Mat P2m(3,4,CV_64FC1,P2d);
	CvMat* P2 = &amp;(CvMat)P2m;

	float _d[1000] = {0.0f};
	Mat outTM(4,points1.size(),CV_32FC1,_d);
	CvMat* out = &amp;(CvMat)outTM;

//using cvTriangulate with the created structures
	cvTriangulatePoints(P1,P2,&amp;__points1,&amp;__points2,out);

//we should check the triangulation result by reprojecting 3D-&gt;2D and checking distance
	vector&lt;Point2d&gt; projPoints[2] = {points1,points2};

	double point2D_dat[3] = {0};
	double point3D_dat[4] = {0};
	Mat twoD(3,1,CV_64FC1,point2D_dat);
	Mat threeD(4,1,CV_64FC1,point3D_dat);

	Mat P[2] = {Mat(P1),Mat(P2)};

	int oc = out-&gt;cols, oc2 = out-&gt;cols*2, oc3 = out-&gt;cols*3;

	status = vector&lt;uchar&gt;(oc);

	//scan all points, reproject 3D-&gt;2D, and keep only good ones
	for(int i=0;i&lt;oc;i++) {
		double W = out-&gt;data.fl[i+oc3];
        point3D_dat[0] = out-&gt;data.fl[i] / W;
        point3D_dat[1] = out-&gt;data.fl[i+oc] / W;
        point3D_dat[2] = out-&gt;data.fl[i+oc2] / W;
        point3D_dat[3] = 1;

        bool push = true;
        /* !!! Project this point for each camera */
        for( int currCamera = 0; currCamera &lt; 2; currCamera++ )
        {
            //reproject! using the P matrix of the current camera
			twoD = P[currCamera] * threeD;

            float x,y;
            float xr,yr,wr;
 	x = (float)projPoints[currCamera][i].x;
	y = (float)projPoints[currCamera][i].y;

            wr = (float)point2D_dat[2];
            xr = (float)(point2D_dat[0]/wr);
            yr = (float)(point2D_dat[1]/wr);

            float deltaX,deltaY;
            deltaX = (float)fabs(x-xr);
            deltaY = (float)fabs(y-yr);

			//printf(&quot;error from cam %d (%.2f,%.2f): %.6f %.6f\n&quot;,currCamera,x,y,deltaX,deltaY);

			if(deltaX &gt; 0.01 || deltaY &gt; 0.01) {
				push = false;
			}
        }
		if(push) {
			// A good 3D reconstructed point, add to known world points

			double s = 7;
			Point3d p3d(point3D_dat[0]/s,point3D_dat[1]/s,point3D_dat[2]/s);
			//printf(&quot;%.3f %.3f %.3f\n&quot;,p3d.x,p3d.y,p3d.z);
			points1Proj.push_back(p3d);
			status[i] = 1;
		} else {
			status[i] = 0;
		}

	}
}
</pre>
<p>OK, now that I have (hopefully) triangulated 3D features from the initial state: 2 views of a planar scene with 5cm translation on the X axis &#8211; I can move on the pose estimation.</p>
<h2>Pose Estimation</h2>
<p>Theoretically, if I know the 3D position of features in the world and their respective 2D position in the image, it should be easy to recover the position of the camera, because there are a rotation matrix and translation vector that define this transformation. Practically in OpenCV, finding the position of an object using 3D-2D correlation is done by using the solvePnP() [<a href="http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html#solvepnp" target="_blank">link</a>] function.</p>
<p>Since I have an initial guess of the rotation and translation &#8211; from the first 2 frames &#8211; I can &#8220;help&#8221; the function estimate the new ones.</p>
<pre class="brush: plain; title: ; notranslate">
void findExtrinsics(vector&lt;Point2d&gt;&amp; points, vector&lt;double&gt;&amp; rv, vector&lt;double&gt;&amp; tv) {
	//estimate extrinsics for these points

	Mat rvec(rv),tvec(tv);

//initial &quot;guess&quot;, in case it wasn't already supplied
	if(rv.size()!=3) {
		rv = vector&lt;double&gt;(3);
		rvec = Mat(rv);
		double _d[9] = {1,0,0,
						0,-1,0,
						0,0,-1};
		Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
	}
	if(tv.size()!=3) {
		tv = vector&lt;double&gt;(3);
		tv[0]=0;tv[1]=0;tv[2]=0;
		tvec = Mat(tv);
	}

	//create a float rep  of points
	vector&lt;Point2f&gt; v2(points.size());
	Mat tmpOut(v2);
	Mat _tmpOut(points);
	_tmpOut.convertTo(tmpOut,CV_32FC2);

	solvePnP(points1projMF,tmpOut,camera_matrix,distortion_coefficients,rvec,tvec,true);

	printf(&quot;frame extrinsic:\nrvec: %.3f %.3f %.3f\ntvec: %.3f %.3f %.3f\n&quot;,rv[0],rv[1],rv[2],tv[0],tv[1],tv[2]);

//the output of the function is a Rodrigues form of rotation, so convert to regular rot-matrix
	Mat rotM(3,3,CV_64FC1); ///,_r);
	Rodrigues(rvec,rotM);
	double* _r = rotM.ptr&lt;double&gt;();
	printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
		_r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
}
</pre>
<p>After getting the extrinsic parameters of the camera, the next step is plugging in the visualization!</p>
<h2>Integrating OpenGL</h2>
<p>Generally, it should be possible to create a 3D scene that matches exactly the true world scene, where the triangulated features appear in the scene aligned exactly with the world. I was not able to achieve that, but I got pretty close:<br />
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>It&#8217;s basically what you do in augmented reality, you align the virtual camera&#8217;s position and rotation with the results you get from the vision part of the system. In the pose estimation we ended with a 3D rotation vector (Rodrigues form) and 3D translation vector which is used as-is, so only the rotation vector should be converted to 3&#215;3 matrix using the Rodrigues() function.</p>
<p>This is the OpenGL glut display() function that draws the scene:</p>
<pre class="brush: plain; title: ; notranslate">
void display(void)
{
	glClearColor(1.0f, 1.0f, 1.0f, 0.5f);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);	// Clear Screen And Depth Buffer

	//draw the background - the frame from the camers
	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	gluOrtho2D(0.0,352.0,288.0,0.0);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glDisable(GL_DEPTH_TEST);
	glDrawPixels(352,288,GL_RGB,GL_UNSIGNED_BYTE,backPxls.data);
	glEnable(GL_DEPTH_TEST);
	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();

    const double t = glutGet(GLUT_ELAPSED_TIME) / 1000.0;
	a = t*20.0;

	glMatrixMode(GL_MODELVIEW);
	glLoadIdentity();

//use the camera position 3D vector
	curCam[0] = cam[0]; curCam[1] = cam[1]; curCam[2] = cam[2];
//there seems to be some kind of offset...
	glTranslated(-curCam[0]+0.5,-curCam[1]+0.7,-curCam[2]);

//and the 3x3 rotation matrix
	double _d[16] = {	rot[0],rot[1],rot[2],0,
						rot[3],rot[4],rot[5],0,
						rot[6],rot[7],rot[8],0,
						0,	   0,	  0		,1};
	glMultMatrixd(_d);

//flip the rotation on the x-axis
	glRotated(180,1,0,0);

	//draw the 3D feature points
	glPushMatrix();
	glColor4d(1.0,0.0,0.0,1.0);
	for(unsigned int i=0;i&lt;points1Proj.size();i++) {
		glPushMatrix();
glTranslated(points1Proj[i].x,points1Proj[i].y,points1Proj[i].z);
		glutSolidSphere(0.03,15,15);
		glPopMatrix();
	}
	glPopMatrix();

	glutSwapBuffers();

	if(!running) {
		glutLeaveMainLoop();
	}

	Sleep(25);
}
</pre>
<p>This pretty much coveres my work, in a very concise way. The complete source code will reveal all I have done, and will provide a better copy-and-paste ground for your own projects.</p>
<h2>Things not covered in this work</h2>
<p>Initially I tried to implement a very crucial part of the PTAM work &#8211; pairing the 3D map with 2D features in the image. This allows them to re-align the map in every frame (when the tracking is bad) so the pose estimation does not &#8220;loose grip&#8221;. In essence, they keep a visual identity for each map feature, very similar to a descriptor like SURF or SIFT, so at any point they can find where in the new image are the features and recover the camera pose from the 2D-3D correspondence. I ran into a problem utilizing OpenCV&#8217;s SURF functionality, it seems to have a bug when trying to compute the descriptor for user-given feature points.</p>
<p>Another thing I chose not to implement is creating a full map of the surroundings. I wanted to achieve a simple working solution for a small map (essentially a single frame), and see how it works. In the original work by K&amp;M they constantly add more and more features to the map untill it has covered the whole surrounding room.</p>
<h2>Code and Working the Program</h2>
<p>As usual my code is available for checkout from the blog&#8217;s SNV repo:</p>
<pre class="brush: plain; title: ; notranslate">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/ptam ptam
</pre>
<p>To get the stereo initialization you must press [spacebar] twice: Once when the camera has stabilized and the features are stable, and another time when the camera has translated and again stabilized.<br />
This marks the 2 keyframes that will be used for stereo init and triangulation.<br />
From that point on, the 3D scene will start and the track-and-estimate stage begins. Try not to move the camera violently as the optical flow may suffer.</p>
<p>Thanks Lior for your help getting the hang of these subjects, and the opportunity to meddle with a subject I long gone wanted to explore.</p>
<p>I hope everyone will enjoy and learn from my enjoyment and learning.</p>
<p>Bye!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F06%2Fimplementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code%2F&amp;title=Implementing%20PTAM%3A%20stereo%2C%20tracking%20and%20pose%20estimation%20for%20AR%20with%20OpenCV%20%5Bw%2F%20code%5D" id="wpa2a_14"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>GeekCon 2009: RunVas &#8211; Our project [w/ video, img]</title>
		<link>http://www.morethantechnical.com/2009/10/13/geekcon-2009-runvas-our-project-w-video-img/</link>
		<comments>http://www.morethantechnical.com/2009/10/13/geekcon-2009-runvas-our-project-w-video-img/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 09:14:49 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[computer vision]]></category>
		<category><![CDATA[geekcon]]></category>
		<category><![CDATA[geekcon 2009]]></category>
		<category><![CDATA[geekcon09]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jogl]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=464</guid>
		<description><![CDATA[Hi everyone In the last weekend I attended GeekCon 2009, a tech-conference, with a friend and colleague Arnon (not Arnon from the blog, who recently had a birthday &#8211; Happy B-Day Arnon!). Each team that attended had to create a project they can complete in 2-days of the conference. Our project is called &#8220;RunVas&#8221;, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2009/10/runvas.PNG" rel="lightbox[464]"><img src="http://www.morethantechnical.com/wp-content/uploads/2009/10/runvas-300x251.PNG" alt="runvas" title="runvas" width="300" height="251" class="alignleft size-medium wp-image-466" /></a>Hi everyone</p>
<p>In the last weekend I attended <a href="http://www.geekcon.org/home">GeekCon 2009</a>, a tech-conference, with a friend and colleague Arnon (not Arnon from the blog, who recently had a birthday &#8211; Happy B-Day Arnon!). Each team that attended had to create a project they can complete in 2-days of the conference. Our project is called &#8220;RunVas&#8221;, and the basic idea was to let people run around and paint by doing so. We wanted to combine computer vision with a little artistic angle.</p>
<p>Here&#8217;s some more details<br />
<span id="more-464"></span></p>
<h2>GeekCon you say?</h2>
<p>First of all a few words about GeekCon itself. The conference is a &#8220;non-conference&#8221; or &#8220;un-conference&#8221;, which is a conference not focused on the business side of innovation and technology, but on the fun and creative side. The moto is something like: &#8220;geek out as hard as you possibly can in 2 days, and get it out of your system for the rest of the year&#8221;.</p>
<p>So teams from all corners of technology: Elect. Eng., Comp. Sci., Metal and wood works, etc.,  register and state their project of choice. The managment decides whether the project can actually be delivered in 2 days, and is actually a &#8220;GeekCon project&#8221;. By &#8220;GeekCon project&#8221; they mean something that demonstrates a nice concept/idea in a cool way, and is <strong>utterly useless </strong>in real life. This is the official stand.</p>
<h2>Our project</h2>
<p>We were accepted in with our project, RunVas. A simple idea, based around the latest fashion of getting people out of the house, away from the computer and hit the lawns running. We wanted also to combine technical and artistic point-of-views. So we create a system that tracks objects in a video scene, and sends the results to a drawing engine. The drawing will be presented on a virtual &#8220;canvas&#8221;, that the runners can view as they run, hence the name &#8220;RunVas&#8221;. We weren&#8217;t able to achieve all of that, but we had a good go at it, and delivered something nice.</p>
<h3>Implementation</h3>
<p>The CV part, object tracking, was programmed by Arnon, using the archaic <a href="http://en.wikipedia.org/wiki/Macromedia_Director">Macromedia Director</a>, donno which version but an old one anyway. And the drawing part was created by myself using the groundwork I had done for my <a href="http://www.morethantechnical.com/2009/07/27/advanced-issues-in-3d-game-building-with-jogl-openglswt-w-code-video/">3D graphics game </a>I programmed for school using SWT/JOGL. Personally I was amazed by how quickly I was able to pick up the framework from that project and re-use it for another, completely different, project. I guess that if you write stuff in a good solid structure you can build anything on top of it.</p>
<h2>Media</h2>
<p>So without further ado, here&#8217;s a short video:<br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/zD-kUlarcyY&#038;hl=en&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/zD-kUlarcyY&#038;hl=en&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p>And my flickr stream with photo I uploaded in real time from the conference:<br />
<object width="400" height="300"><param name="flashvars" value="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fphotos%2F30599876%40N02%2Ftags%2Fgeekcon09%2Fshow%2F&#038;page_show_back_url=%2Fphotos%2F30599876%40N02%2Ftags%2Fgeekcon09%2F&#038;user_id=30599876@N02&#038;tags=geekcon09&#038;jump_to=&#038;start_index="></param><param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649"></param><param name="allowFullScreen" value="true"></param><embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" allowFullScreen="true" flashvars="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fphotos%2F30599876%40N02%2Ftags%2Fgeekcon09%2Fshow%2F&#038;page_show_back_url=%2Fphotos%2F30599876%40N02%2Ftags%2Fgeekcon09%2F&#038;user_id=30599876@N02&#038;tags=geekcon09&#038;jump_to=&#038;start_index=" width="400" height="300"></embed></object></p>
<h2>Code</h2>
<p>The code for the canvas drawing proggy is available in the <a href="http://code.google.com/p/morethantechnical/source/checkout">SVN repo</a>.</p>
<p>Thanks!<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2009%2F10%2F13%2Fgeekcon-2009-runvas-our-project-w-video-img%2F&amp;title=GeekCon%202009%3A%20RunVas%20%26%238211%3B%20Our%20project%20%5Bw%2F%20video%2C%20img%5D" id="wpa2a_16"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2009/10/13/geekcon-2009-runvas-our-project-w-video-img/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

