<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>More Than Technical &#187; opencv</title>
	<atom:link href="http://www.morethantechnical.com/category/opencv/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.morethantechnical.com</link>
	<description>On software, code, the internet and more.</description>
	<lastBuildDate>Sun, 05 Feb 2012 07:04:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Simple triangulation with OpenCV from Harley &amp; Zisserman [w/ code]</title>
		<link>http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/</link>
		<comments>http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 01:07:11 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[pcl]]></category>
		<category><![CDATA[reconstruction]]></category>
		<category><![CDATA[triangulation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=1023</guid>
		<description><![CDATA[Easily using OpenCV 2.3+ to triangulate points from known camera matrices and point sets.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2012/01/screenshot-1325526702.png" rel="lightbox[1023]"><img src="http://www.morethantechnical.com/wp-content/uploads/2012/01/screenshot-1325526702-150x150.png" alt="" title="Triangulated" width="150" height="150" class="alignleft size-thumbnail wp-image-1031" /></a>Hi<br />
I sense that a lot of people are looking for a simple triangulation method with OpenCV, when they have two images and matching features.<br />
While OpenCV contains the function cvTriangulatePoints in the triangulation.cpp file, it is not documented, and uses the arcane C API.<br />
Luckily, Hartley and Zisserman describe in their excellent book &#8220;Multiple View Geometry&#8221; (in many cases considered to be &#8220;The Bible&#8221; of 3D reconstruction), a simple method for linear triangulation. This method is actually discussed earlier in Hartley&#8217;s article &#8220;<a href="http://users.cecs.anu.edu.au/~hartley/Papers/triangulation/triangulation.pdf" target="_blank">Triangulation</a>&#8220;.<br />
I implemented it using the new OpenCV 2.3+ C++ API, which makes it super easy, and here it is before you.</p>
<p><span id="more-1023"></span></p>
<p>The thing about triangulation is that you need to know the extrinsic parameters of your cameras &#8211; the difference in location and rotation between them.<br />
To get the camera matrices&#8230; that&#8217;s a different story that I&#8217;m going to cover shortly in a tutorial (already in writing) about Structure from Motion.</p>
<p>But let&#8217;s assume that we already have the extrinsic matrices. In most cases, where you know what the motion is (you took the pictures <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> , you can just write the matrices explicitly.</p>
<h2>Linear Triangulation</h2>
<pre class="brush: plain; title: ; notranslate">
/**
 From &quot;Triangulation&quot;, Hartley, R.I. and Sturm, P., Computer vision and image understanding, 1997
 */
Mat_ LinearLSTriangulation(Point3d u,		//homogenous image point (u,v,1)
				   Matx34d P,		//camera 1 matrix
				   Point3d u1,		//homogenous image point in 2nd camera
				   Matx34d P1		//camera 2 matrix
								   )
{
	//build matrix A for homogenous equation system Ax = 0
	//assume X = (x,y,z,1), for Linear-LS method
	//which turns it into a AX = B system, where A is 4x3, X is 3x1 and B is 4x1
	Matx43d A(u.x*P(2,0)-P(0,0),	u.x*P(2,1)-P(0,1),		u.x*P(2,2)-P(0,2),
		  u.y*P(2,0)-P(1,0),	u.y*P(2,1)-P(1,1),		u.y*P(2,2)-P(1,2),
		  u1.x*P1(2,0)-P1(0,0), u1.x*P1(2,1)-P1(0,1),	u1.x*P1(2,2)-P1(0,2),
		  u1.y*P1(2,0)-P1(1,0), u1.y*P1(2,1)-P1(1,1),	u1.y*P1(2,2)-P1(1,2)
			  );
	Mat_ B = (Mat_(4,1) &lt;&lt;	-(u.x*P(2,3)	-P(0,3)),
					  -(u.y*P(2,3)	-P(1,3)),
					  -(u1.x*P1(2,3)	-P1(0,3)),
					  -(u1.y*P1(2,3)	-P1(1,3)));

	Mat_ X;
	solve(A,B,X,DECOMP_SVD);

	return X;
}
</pre>
<p>This method relies very simply on the principle that every 2D point in image plane coordinates is a projection of the real 3D point. So if you have two views, you can set up an overdetermined linear equation system to solve for the 3D position.</p>
<p>See how simple defining a Matx43d struct from scratch and using it in solve(..) is?<br />
I tried doing some more fancy stuff with Mat.row(i) and Mat.col(i), trying to stick to Hartley&#8217;s description of the A matrix, but it just didn&#8217;t work.</p>
<h2>Using it</h2>
<p>Using this method is easy:</p>
<pre class="brush: plain; title: ; notranslate">
//Triagulate points
void TriangulatePoints(const vector&amp; pt_set1,
					   const vector&amp; pt_set2,
					   const Mat&amp; Kinv,
					   const Matx34d&amp; P,
					   const Matx34d&amp; P1,
					   vector&amp; pointcloud,
					   vector&amp; correspImg1Pt)
{
#ifdef __SFM__DEBUG__
	vector depths;
#endif

	pointcloud.clear();
	correspImg1Pt.clear();

	cout &lt;&lt; &quot;Triangulating...&quot;;
	double t = getTickCount();
	unsigned int pts_size = pt_set1.size();
#pragma omp parallel for
	for (unsigned int i=0; i		Point2f kp = pt_set1[i];
		Point3d u(kp.x,kp.y,1.0);
		Mat_ um = Kinv * Mat_(u);
		u = um.at(0);
		Point2f kp1 = pt_set2[i];
		Point3d u1(kp1.x,kp1.y,1.0);
		Mat_ um1 = Kinv * Mat_(u1);
		u1 = um1.at(0);

		Mat_ X = IterativeLinearLSTriangulation(u,P,u1,P1);

//		if(X(2) &gt; 6 || X(2) &lt; 0) continue;

#pragma omp critical
		{
			pointcloud.push_back(Point3d(X(0),X(1),X(2)));
			correspImg1Pt.push_back(pt_set1[i]);
#ifdef __SFM__DEBUG__
			depths.push_back(X(2));
#endif
		}
	}
	t = ((double)getTickCount() - t)/getTickFrequency();
	cout &lt;&lt; &quot;Done. (&quot;&lt;
	//show &quot;range image&quot;
#ifdef __SFM__DEBUG__
	{
		double minVal,maxVal;
		minMaxLoc(depths, &amp;minVal, &amp;maxVal);
		Mat tmp(240,320,CV_8UC3); //cvtColor(img_1_orig, tmp, CV_BGR2HSV);
		for (unsigned int i=0; i			double _d = MAX(MIN((pointcloud[i].z-minVal)/(maxVal-minVal),1.0),0.0);
			circle(tmp, correspImg1Pt[i], 1, Scalar(255 * (1.0-(_d)),255,255), CV_FILLED);
		}
		cvtColor(tmp, tmp, CV_HSV2BGR);
		imshow(&quot;out&quot;, tmp);
		waitKey(0);
		destroyWindow(&quot;out&quot;);
	}
#endif
}
</pre>
<p>Note that you must have the camera matrix K (a 3&#215;3 matrix of the intrinsic parameters), or rather it&#8217;s inverse, noted here as Kinv.</p>
<h2>Results and some discussion</h2>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2012/01/ER_15_12_2011_06_06_23.jpg" rel="lightbox[1023]"><img class="wp-image-1026 " title="Left image" src="http://www.morethantechnical.com/wp-content/uploads/2012/01/ER_15_12_2011_06_06_23-300x225.jpg" alt="" width="240" height="180" /></a><a href="http://www.morethantechnical.com/wp-content/uploads/2012/01/ER_15_12_2011_06_06_35.jpg" rel="lightbox[1023]"><img class="wp-image-1027 " title="Right image" src="http://www.morethantechnical.com/wp-content/uploads/2012/01/ER_15_12_2011_06_06_35-300x225.jpg" alt="" width="240" height="180" /></a></p>
<div id="attachment_1031" class="wp-caption alignnone" style="width: 598px"><a href="http://www.morethantechnical.com/wp-content/uploads/2012/01/screenshot-1325526702.png" rel="lightbox[1023]"><img class=" wp-image-1031 " title="Triangulated" src="http://www.morethantechnical.com/wp-content/uploads/2012/01/screenshot-1325526702.png" alt="" width="588" height="360" /></a><p class="wp-caption-text">3D view of the triangulated point cloud</p></div>
<p>Notice how stuff is distorted in the 3D view&#8230; but this is not due projective ambiguity! as I am using the Essential Matrix to obtain the camera P matrices (cameras are calibrated). Hartley and Zisserman explain this in their book on page 258, and the reasons for projective ambiguity (and how to resolve it) on page 265. The distortion must be due to inaccurate point correspondence&#8230;  </p>
<p>The cool visualization is done using the excellent <a href="http://www.pointclouds.org" title="PCL" target="_blank">PCL</a> library.</p>
<h2>Iterative Linear Triangulation</h2>
<p>Hartley, in his article &#8220;Triangulation&#8221; describes another triangulation algorithm, an iterative one, which he reports to &#8220;perform substantially better than the [...] non-iterative linear methods&#8221;. It is, again, very easy to implement, and here it is:</p>
<pre class="brush: plain; title: ; notranslate">
/**
 From &quot;Triangulation&quot;, Hartley, R.I. and Sturm, P., Computer vision and image understanding, 1997
 */
Mat_&lt;double&gt; IterativeLinearLSTriangulation(Point3d u,	//homogenous image point (u,v,1)
											Matx34d P,			//camera 1 matrix
											Point3d u1,			//homogenous image point in 2nd camera
											Matx34d P1			//camera 2 matrix
											) {
	double wi = 1, wi1 = 1;
	Mat_&lt;double&gt; X(4,1);
	for (int i=0; i&lt;10; i++) { //Hartley suggests 10 iterations at most
		Mat_&lt;double&gt; X_ = LinearLSTriangulation(u,P,u1,P1);
		X(0) = X_(0); X(1) = X_(1); X(2) = X_(2); X_(3) = 1.0;

		//recalculate weights
		double p2x = Mat_&lt;double&gt;(Mat_&lt;double&gt;(P).row(2)*X)(0);
		double p2x1 = Mat_&lt;double&gt;(Mat_&lt;double&gt;(P1).row(2)*X)(0);

		//breaking point
		if(fabsf(wi - p2x) &lt;= EPSILON &amp;&amp; fabsf(wi1 - p2x1) &lt;= EPSILON) break;

		wi = p2x;
		wi1 = p2x1;

		//reweight equations and solve
		Matx43d A((u.x*P(2,0)-P(0,0))/wi,		(u.x*P(2,1)-P(0,1))/wi,			(u.x*P(2,2)-P(0,2))/wi,
				  (u.y*P(2,0)-P(1,0))/wi,		(u.y*P(2,1)-P(1,1))/wi,			(u.y*P(2,2)-P(1,2))/wi,
				  (u1.x*P1(2,0)-P1(0,0))/wi1,	(u1.x*P1(2,1)-P1(0,1))/wi1,		(u1.x*P1(2,2)-P1(0,2))/wi1,
				  (u1.y*P1(2,0)-P1(1,0))/wi1,	(u1.y*P1(2,1)-P1(1,1))/wi1,		(u1.y*P1(2,2)-P1(1,2))/wi1
				  );
		Mat_&lt;double&gt; B = (Mat_&lt;double&gt;(4,1) &lt;&lt;	-(u.x*P(2,3)	-P(0,3))/wi,
						  -(u.y*P(2,3)	-P(1,3))/wi,
						  -(u1.x*P1(2,3)	-P1(0,3))/wi1,
						  -(u1.y*P1(2,3)	-P1(1,3))/wi1
						  );

		solve(A,B,X_,DECOMP_SVD);
		X(0) = X_(0); X(1) = X_(1); X(2) = X_(2); X_(3) = 1.0;
	}
	return X;
}
</pre>
<p>(remember to define your EPSILON)<br />
This time he works iteratively in order to minimize the reprojection error of the reconstructed point to the original image coordinate, by weighting the linear equation system.</p>
<h2>Recap</h2>
<p>So we&#8217;ve seen how easy it is to implement these triangulation methods using OpenCV&#8217;s nice Matx### and Mat_<X> structs.<br />
Also solve(&#8230;,DECOMP_SVD) is very handy for overdetermined non-homogeneous linear equation systems.<br />
Watch out for my Structure from Motion tutorial coming up, which will be all about using OpenCV to get point correspondence from pairs of images, obtaining camera matrices and recovering dense depth.</p>
<p>If you are looking for more robust solutions for SfM and 3D reconstructions, see:<br />
<a href="http://phototour.cs.washington.edu/bundler/" title="http://phototour.cs.washington.edu/bundler/" target="_blank">http://phototour.cs.washington.edu/bundler/</a><br />
<a href="http://code.google.com/p/libmv/" title="http://code.google.com/p/libmv/" target="_blank">http://code.google.com/p/libmv/</a><br />
<a href="http://www.cs.washington.edu/homes/ccwu/vsfm/" title="http://www.cs.washington.edu/homes/ccwu/vsfm/" target="_blank">http://www.cs.washington.edu/homes/ccwu/vsfm/</a><br />
Enjoy,<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Spherical harmonics face relighting using OpenCV, OpenGL [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 00:59:34 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[glsl]]></category>
		<category><![CDATA[harmonics]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[recoloring]]></category>
		<category><![CDATA[relighting]]></category>
		<category><![CDATA[shaders]]></category>
		<category><![CDATA[spherical]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=948</guid>
		<description><![CDATA[Implementing a face image relighting algorithm using spherical harmonics, based on a paper written by Wang et al (2007).]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-19-at-8.13.27-PM.png" rel="lightbox[948]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/Screen-shot-2011-12-19-at-8.13.27-PM-300x130.png" alt="" title="Spherical harmonics face relighting" width="300" height="130" class="alignleft size-medium wp-image-1015" /></a>Hi!<br />
I&#8217;ve been working on implementing a face image relighting algorithm using spherical harmonics, one of the most elegant methods I&#8217;ve seen lately.<br />
I start up by aligning a face model with OpenGL to automatically get the canonical face normals, which brushed up my knowledge of GLSL. Then I continue to estimating real faces &#8220;spharmonics&#8221;, and relighting.</p>
<p>Let&#8217;s start!<br />
<span id="more-948"></span></p>
<h2>Some mathematical background</h2>
<p>Don&#8217;t worry, it wont hurt. much.</p>
<p>So Spherical Harmonics, were invented to numerically express a whole bunch of things in physics like gravity and magnetic fields. But they also became very useful for computer graphics as they are perfect for modelling light falling on a spherical body.</p>
<h3>But what ARE those mysterious spherical harmonics? </h3>
<p>The way I see it, they are a series of &#8220;modes&#8221; or &#8220;eigenvectors&#8221; or &#8220;orthogonal components&#8221; of a base that spans the surface of a sphere.<br />
To put it simple, they describe the surface of a sphere in increasing finer grained portions. Much like a Fourier decomposition does to a function, there is the base and there are coefficients that when multiplied with the base they recover the function.</p>
<h3>How is that good for graphics? </h3>
<p>People have used spherical harmonics mostly to model lighting of spherical objects. When you know the coefficients that describe the lighting, you can change them to <i>Re-light</i> an object, or <i>De-light</i>, or transfer the lighting conditions of one scene to another. Very useful!</p>
<p>Some good researchers, Basri and Jacobs, back in 2001 have formulated the first 9 harmonics as a function of the surface normal. On this page Basri references all his work on the subject: <a href="http://www.wisdom.weizmann.ac.il/~ronen/index_files/harmonic.html" target="_blank">http://www.wisdom.weizmann.ac.il/~ronen/index_files/harmonic.html</a> </p>
<p>But I like to reference a work that&#8217;s easier to process than Basri&#8217;s, that is the work of Wang et al from 2007. These guys made the steps to use spherical harmonics easier to follow: <a href="http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf" title="http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf" target="_blank">http://research.microsoft.com/en-us/um/people/zliu/cvpr2007.pdf</a>.<br />
But their algorithm is quite advanced, as it solves not only for the harmonics&#8217; coefficients but also for the normals of the object in the image. They use some fancy optimization of an energy function over a graph, that I&#8217;m not going to discuss.<br />
But they did make the process of finding the spherical harmonics&#8217; coefficient very clear.</p>
<h4>The bottom line</h4>
<p>We should solve for a vector of 9 coefficients that describes the &#8220;lighting of the object&#8221; (a face in our case).<br />
Each coefficient will tell us how much that specific harmonic is strong or weak, or in other words how lit is that certain area of the object.</p>
<p>Wang and Basri show a very simple method of using simultaneous linear equations to solve for the lighting coefficients, it depends only on knowing the normal of the object&#8217;s surface at each pixel in the image.</p>
<h2>Getting the normals of a canonical face</h2>
<p>So to get the normals, I thought the best way is to use a canonical model of a face (some king of an average face), instead of trying to recover the normals from the image pixels.<br />
For that end, I used Rhino3D to model (very roughly) a shape that resembles a human face, starting from an elongated sphere.<br />
Now all that&#8217;s left is to align the model with the face to relight, and that will supply the normals.<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/snapshot00.png" rel="lightbox[948]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/snapshot00-300x224.png" alt="" title="rough model of a human face" width="300" height="224" class="alignleft size-medium wp-image-1011" /></a><br />
Cool. Then I built a small app that allows the user to move the model around until it&#8217;s aligned with the face image. I used <a href="http://www.fltk.org/" target="_blank">FLTK 3.0</a> to do it since they have a simple interface with OpenGL, they are cross platform, and lightweight.<br />
So I set up a scene where I have the image as the background, and the model is floating above it, half transparent so the user can find the right spot. I added functions for rotating the model, and extra stuff like turning the model opaque.</p>
<p style="text-align: center">
<iframe width="480" height="360" src="http://www.youtube.com/embed/wIwAX2UM64E" frameborder="0" allowfullscreen></iframe>
</p>
<p>To get the normal map I used a very simple GLSL shader, that simply colors the pixel with the value of the normal nX,nY,nZ -> R,G,B.<br />
This way the result image OpenGL renders is simply the normal map of the face model. I just grab it using glReadPixels.</p>
<h2>Estimating spherical harmonics</h2>
<p>So, after the model is aligned, we can assume we have the normals ready for us for each pixel in the image, and the intensity in each pixel is also known.<br />
The first step that Wang suggests, without knowledge of the real face albedo (the real color of every pixel without any lighting effect), is to get an approximation of the 9-vector of lighting coefficients by setting a constant albedo. Easy enough, we can set the albedo to the average color in the face.<br />
Then we can simply build a huge set of linear equations (huge as the number of pixels in the image), and solve an overdetermined system to get the 9 coefficients.</p>
<pre class="brush: plain; title: ; notranslate">
		Scalar albedo_constant = mean(face_img_hsv, smallFaceMask);

		//setup linear equation system, lighting coefficients (l) is unknown
		//I = p00 * Ht * l
		float p00 = (float)albedo_constant[2] / 255.0f;

		cout &lt;&lt; &quot;Build Ht(&quot;&lt;&lt;n&lt;&lt;&quot;,9)...&quot;;
		cout &lt;&lt; &quot;Build I(&quot;&lt;&lt;n&lt;&lt;&quot;,1)...&quot;;
		//build Ht and I
		Mat_&lt;float&gt; Ht(n,9);
		Mat_&lt;float&gt; I(n,1);
		int pos = 0;
		vector&lt;Mat_&lt;uchar&gt; &gt; face_img_chnls; split(face_img_hsv, face_img_chnls);
		for (int i=0; i&lt;normalMapFlat.rows; i++) {
			if (smallFaceMask(i) == 0) { //is this pixel on the face?
				continue;
			}
			Ht.row(pos) = p00 * calculateSphericalHarmonicsForNormal(normalMapFlat(i));
			I(pos,0) = face_img_chnls[2](i) / 255.0f; //get V from HSV of pixel [0,1]
			pos ++;
		}
		cout &lt;&lt; &quot;DONE&quot;  &lt;&lt; endl;

		cout &lt;&lt; &quot;Solve&quot; &lt;&lt;endl;
		solve(Ht, I, l, DECOMP_SVD);

		cout &lt;&lt; &quot;initial lighting coeffs: &quot;;
		for (int i=0; i&lt;l.rows; i++) {
			cout&lt;&lt;l.at&lt;float&gt;(i)&lt;&lt;&quot;,&quot;;
		}
</pre>
<p>Booyah! lighting coefficients.</p>
<p>But this is only the first step. Now we can get an approximation of the albedo as well, using the coefficients:</p>
<pre class="brush: plain; title: ; notranslate">
		Mat_&lt;Vec3b&gt; face_img_v3b = face_img;

		#pragma omp parallel for schedule(dynamic)
		for (int y=0; y&lt;face_img.rows; y++) {
			for (int x=0; x&lt;face_img.cols; x++) {
				if (face_mask(y,x) == 0) {
					albedo(y,x) = 0;
					continue;
				}
				Mat sph = calculateSphericalHarmonicsForNormal(normalMap(y,x));
				Mat_&lt;float&gt; sph_l = sph * l;
				float fsph_l = sph_l(0);

				for (int cn = 0; cn&lt;3; cn++) {
					float fimg = face_img_v3b(y,x)[cn] / 255.0f;
					albedo(y,x)[cn] = (fimg / fsph_l);
				}
			}
		}
</pre>
<p>Done.<br />
Now that we have an initial albedo, Wang suggests we compute the coefficients again to get a better approximation, and then the albedo again.<br />
I however ran into some problems trying to do the second iteration, and the results always came out too dark&#8230; But even with the first iteration you can see a very nice change.<br />
Look at the video from before, you can see the right side of the face, which is over-lit, was darkened and the left side was lit up.</p>
<h2>Code</h2>
<p>The code for spherical harmonics analysis of images is part of a bigger project I have been working on for some time. I also spoke of it in a <a href="http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/" target="_blank">previous post</a>.<br />
Anyway it&#8217;s up in GitHub: <a href="https://github.com/royshil/HeadReplacement/tree/master/HeadReplacement" target="_blank">https://github.com/royshil/HeadReplacement/tree/master/HeadReplacement</a><br />
You&#8217;re looking for 4 files:</p>
<ul>
<li>SpharmonicsUI.cpp
<li>SpharmonicsUI.h
<li>spherical_harmonics_analysis.cpp
<li>spherical_harmonics_analysis.h
</ul>
<p>You can use the CMakeLists.txt to compile, but here&#8217;s a CMakeLists.txt that should take you there in one piece (fingers crossed):</p>
<pre class="brush: plain; title: ; notranslate">
find_package(OpenCV REQUIRED)
find_package(OpenGL REQUIRED)
find_package(OpenMP REQUIRED)

######## Find and add GLEE ########
file(GLOB_RECURSE GLEE_PATH &quot;${CMAKE_SOURCE_DIR}/GLee.c&quot;)
if(GLEE_PATH STREQUAL GLEE_PATH-NOTFOUND)
	message(STATUS &quot;GLEE was not found&quot;)
else()
	list(LENGTH GLEE_PATH GLEE_PATH_LEN)
	if(GLEE_PATH_LEN GREATER 1)
		list(GET GLEE_PATH 1 GLEE_PATH)
	endif()
	file(RELATIVE_PATH GLEE_PATH ${CMAKE_SOURCE_DIR} ${GLEE_PATH})
	get_filename_component(GLEE_PATH ${GLEE_PATH} REALPATH)
	get_filename_component(GLEE_PATH ${GLEE_PATH} PATH)
	message(STATUS &quot;Found GLEE at ${GLEE_PATH}&quot;)
	add_library(GLEE ${GLEE_PATH}/GLee.c)
endif()

############ Find FLTK ############
if(NOT DEFINED FLTK_PATH)
	file(GLOB_RECURSE FLTK_PATH &quot;${CMAKE_SOURCE_DIR}/Widget.h&quot;)
	if(FLTK_PATH STREQUAL FLTK_PATH-NOTFOUND   OR   FLTK_PATH STREQUAL &quot;&quot;)
		message(STATUS &quot;FLTK was not found !!!!!&quot;)
	else()
		list(LENGTH FLTK_PATH FLTK_PATH_LEN)
		if(FLTK_PATH_LEN GREATER 1)
			list(GET FLTK_PATH 1 FLTK_PATH)
		endif()
		file(RELATIVE_PATH FLTK_PATH ${CMAKE_SOURCE_DIR} ${FLTK_PATH})
		get_filename_component(FLTK_PATH ${FLTK_PATH} REALPATH)
		get_filename_component(FLTK_PATH ${FLTK_PATH} PATH)
		message(STATUS &quot;Found FLTK at ${FLTK_PATH}&quot;)
	endif()
else()
	get_filename_component(FLTK_PATH ${FLTK_PATH} REALPATH)
	message(STATUS &quot;FLTK path set to ${FLTK_PATH}&quot;)
endif()
set(FLTK_INCLUDE_DIR ${FLTK_PATH}/include)
set(FLTK_LIB_DIR ${FLTK_PATH}/lib)

######## Relighting #######
include_directories(${FLTK_INCLUDE_DIR})
include_directories(${OpenGL_INCLUDE_DIRS})
include_directories(${GLEE_PATH})
add_library(VirtualSurgeon_Relighting
	../HeadReplacement/glm.cpp
	../HeadReplacement/spherical_harmonics_analysis.cpp
	../HeadReplacement/LaplacianBlending.cpp
	../HeadReplacement/SpharmonicsUI.cpp
	../HeadReplacement/OGL_OCV_common.cpp
	)
</pre>
<p>Note that I had to resort to some very dark magic to recover the location of FLTK and GLEE&#8230; But it&#8217;s a jungle out there.</p>
<p>The source of the photograph is: <a href="http://www.flickr.com/photos/roel1943/309048020/" target="_blank">http://www.flickr.com/photos/roel1943/309048020/</a><br />
It is released under Creative Commons 2.0 ShareAlike-Attribution. So all the results here are also CC-2.0-SA-A&#8230; <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Enjoy,<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/12/20/spherical-harmonics-face-relighting-using-opencv-opengl-w-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Identity Transfer in Photographs</title>
		<link>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/</link>
		<comments>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 05:29:58 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[head]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[photographs]]></category>
		<category><![CDATA[replacement]]></category>
		<category><![CDATA[survey]]></category>
		<category><![CDATA[transfer]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=1000</guid>
		<description><![CDATA[Hi! I would like to present something I have been working on recently, a work that immensely affect what I wrote in the blog in the past two years&#8230; To use it: Go on this page, Watch the short instruction video, download the application (MacOSX-Intel-x64 Win32) and make yourself a model! It takes just a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/12/male_model.jpg" rel="lightbox[1000]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/12/male_model-150x150.jpg" alt="" title="male_model" width="150" height="150" class="alignleft size-thumbnail wp-image-1001" /></a>Hi!</p>
<p>I would like to present something I have been working on recently, a work that immensely affect what I wrote in the blog in the past two years&#8230;</p>
<p>To use it:<br />
Go on this <a href="http://palimpost.xvm.mit.edu/HeadReplacement/default.html">page</a>,<br />
Watch the short <a href="http://youtu.be/YhHb3FAqaUk">instruction video</a>,<br />
download the application (<a href="http://palimpost.xvm.mit.edu/HeadReplacement/bin/HeadReplacement.dmg">MacOSX-Intel-x64</a> <a href="http://palimpost.xvm.mit.edu/HeadReplacement/bin/HeadReplacement_win32.zip">Win32</a>)<br />
and make yourself a model!<br />
It takes just a couple of minutes and it&#8217;s very simple&#8230;</p>
<p>This work is an academic research project, Please please, take the time to fill out the <a href="https://docs.google.com/spreadsheet/viewform?formkey=dGNBX0ljZXRVXzdtbjBQZ0dULTQwelE6MQ">survey</a>! It is very short..<br />
The results of the <a href="https://docs.google.com/spreadsheet/viewform?formkey=dGNBX0ljZXRVXzdtbjBQZ0dULTQwelE6MQ">survey</a> (the survey alone, no photos of your work) will possibly be published in an academic paper.</p>
<p>Note: No information is sent anywhere in any way outside of your machine (you may even unplug the network). All results are saved locally on your computer, and no inputs are recorded or transmitted. The application contains no malware. The source is available here.</p>
<p>Note II: All stock photos of models used in the application are released under Creative Commons By-NC-SA 2.0 license. Creator: http://www.flickr.com/photos/kk/. If you wish to distribute your results, they should also be released under a CC-By-NC-SA 2.0 license.</p>
<p>Thank you!<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F12%2F01%2Fidentity-transfer-in-photographs%2F&amp;title=Identity%20Transfer%20in%20Photographs" id="wpa2a_2"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/12/01/identity-transfer-in-photographs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Just a simple Laplacian pyramid blender using OpenCV [w/code]</title>
		<link>http://www.morethantechnical.com/2011/11/13/just-a-simple-laplacian-pyramid-blender-using-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2011/11/13/just-a-simple-laplacian-pyramid-blender-using-opencv-wcode/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 07:39:47 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[blend]]></category>
		<category><![CDATA[blending]]></category>
		<category><![CDATA[laplacian]]></category>
		<category><![CDATA[pyramids]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=962</guid>
		<description><![CDATA[I want to share a small piece of code to do Laplacian Blending using OpenCV. It&#8217;s one of the most basic and canonical methods of image blending, and is a must exercise for any computer graphics student. Well basically it&#8217;s a matter of creating two Laplacian pyramids of both images, and a Gaussian pyramid of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/11/Screen-shot-2011-11-13-at-2.37.49-AM.png" rel="lightbox[962]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/11/Screen-shot-2011-11-13-at-2.37.49-AM-150x150.png" alt="" title="OrangApple" width="150" height="150" class="alignleft size-thumbnail wp-image-992" /></a>I want to share a small piece of code to do Laplacian Blending using OpenCV. It&#8217;s one of the most basic and canonical methods of image blending, and is a must exercise for any computer graphics student.<br />
<span id="more-962"></span><br />
Well basically it&#8217;s a matter of creating two Laplacian pyramids of both images, and a Gaussian pyramid of the mask.<br />
Then we blend the pyramids into one, and collapse the resulting pyramid into the blended image.</p>
<pre class="brush: plain; title: ; notranslate">
#include &quot;opencv2/opencv.hpp&quot;

using namespace cv;

class LaplacianBlending {
private:
	Mat_&lt;Vec3f&gt; left;
	Mat_&lt;Vec3f&gt; right;
	Mat_&lt;float&gt; blendMask;

	vector&lt;Mat_&lt;Vec3f&gt; &gt; leftLapPyr,rightLapPyr,resultLapPyr;
	Mat leftSmallestLevel, rightSmallestLevel, resultSmallestLevel;
	vector&lt;Mat_&lt;Vec3f&gt; &gt; maskGaussianPyramid; //masks are 3-channels for easier multiplication with RGB

	int levels;

	void buildPyramids() {
		buildLaplacianPyramid(left,leftLapPyr,leftSmallestLevel);
		buildLaplacianPyramid(right,rightLapPyr,rightSmallestLevel);
		buildGaussianPyramid();
	}

	void buildGaussianPyramid() {
		assert(leftLapPyr.size()&gt;0);

		maskGaussianPyramid.clear();
		Mat currentImg;
		cvtColor(blendMask, currentImg, CV_GRAY2BGR);
		maskGaussianPyramid.push_back(currentImg); //highest level

		currentImg = blendMask;
		for (int l=1; l&lt;levels+1; l++) {
			Mat _down;
			if (leftLapPyr.size() &gt; l) {
				pyrDown(currentImg, _down, leftLapPyr[l].size());
			} else {
				pyrDown(currentImg, _down, leftSmallestLevel.size()); //smallest level
			}

			Mat down;
			cvtColor(_down, down, CV_GRAY2BGR);
			maskGaussianPyramid.push_back(down);
			currentImg = _down;
		}
	}

	void buildLaplacianPyramid(const Mat&amp; img, vector&lt;Mat_&lt;Vec3f&gt; &gt;&amp; lapPyr, Mat&amp; smallestLevel) {
		lapPyr.clear();
		Mat currentImg = img;
		for (int l=0; l&lt;levels; l++) {
			Mat down,up;
			pyrDown(currentImg, down);
			pyrUp(down, up, currentImg.size());
			Mat lap = currentImg - up;
			lapPyr.push_back(lap);
			currentImg = down;
		}
		currentImg.copyTo(smallestLevel);
	}

	Mat_&lt;Vec3f&gt; reconstructImgFromLapPyramid() {
		Mat currentImg = resultSmallestLevel;
		for (int l=levels-1; l&gt;=0; l--) {
			Mat up;

			pyrUp(currentImg, up, resultLapPyr[l].size());
			currentImg = up + resultLapPyr[l];
		}
		return currentImg;
	}

	void blendLapPyrs() {
		resultSmallestLevel = leftSmallestLevel.mul(maskGaussianPyramid.back()) +
									rightSmallestLevel.mul(Scalar(1.0,1.0,1.0) - maskGaussianPyramid.back());
		for (int l=0; l&lt;levels; l++) {
			Mat A = leftLapPyr[l].mul(maskGaussianPyramid[l]);
			Mat antiMask = Scalar(1.0,1.0,1.0) - maskGaussianPyramid[l];
			Mat B = rightLapPyr[l].mul(antiMask);
			Mat_&lt;Vec3f&gt; blendedLevel = A + B;

			resultLapPyr.push_back(blendedLevel);
		}
	}

public:
	LaplacianBlending(const Mat_&lt;Vec3f&gt;&amp; _left, const Mat_&lt;Vec3f&gt;&amp; _right, const Mat_&lt;float&gt;&amp; _blendMask, int _levels):
	left(_left),right(_right),blendMask(_blendMask),levels(_levels)
	{
		assert(_left.size() == _right.size());
		assert(_left.size() == _blendMask.size());
		buildPyramids();
		blendLapPyrs();
	};

	Mat_&lt;Vec3f&gt; blend() {
		return reconstructImgFromLapPyramid();
	}
};

Mat_&lt;Vec3f&gt; LaplacianBlend(const Mat_&lt;Vec3f&gt;&amp; l, const Mat_&lt;Vec3f&gt;&amp; r, const Mat_&lt;float&gt;&amp; m) {
	LaplacianBlending lb(l,r,m,4);
	return lb.blend();
}
</pre>
<p>To use, simply call the function LaplacianBlend with your two images and your mask, and the result will be returned.<br />
Here&#8217;s something I did with it:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/11/Screen-shot-2011-11-13-at-2.35.01-AM.png" rel="lightbox[962]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/11/Screen-shot-2011-11-13-at-2.35.01-AM.png" alt="" title="Laplacian blending" width="851" height="254" class="alignleft size-full wp-image-991" /></a></p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F11%2F13%2Fjust-a-simple-laplacian-pyramid-blender-using-opencv-wcode%2F&amp;title=Just%20a%20simple%20Laplacian%20pyramid%20blender%20using%20OpenCV%20%5Bw%2Fcode%5D" id="wpa2a_4"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/11/13/just-a-simple-laplacian-pyramid-blender-using-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A simple object classifier with Bag-of-Words using OpenCV 2.3 [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 03:34:27 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[object]]></category>
		<category><![CDATA[svm]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=917</guid>
		<description><![CDATA[ A simple object classifier with Bag-of-Words using OpenCV 2.3]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2011/08/20101201191626.jpg" rel="lightbox[917]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/08/20101201191626-300x178.jpg" alt="" title="20101201191626" width="300" height="178" class="alignleft size-medium wp-image-928" /></a><br />
Just wanted to share of some code I&#8217;ve been writing.<br />
So I wanted to create a food classifier, for a cool project down in the Media Lab called FoodCam. It&#8217;s basically a camera that people put free food under, and they can send an email alert to the entire building to come eat (by pushing a huge button marked &#8220;Dinner Bell&#8221;). Really a cool thing.</p>
<p>OK let&#8217;s get down to business.<br />
<span id="more-917"></span><br />
I followed a very simple technique described in <a href="http://scholar.google.com/scholar?cluster=2469382617192238945&amp;hl=en&amp;as_sdt=0,22" target="_blank">this paper</a>. I know, you say, &#8220;A Paper? Really? I&#8217;m not gonna read that technical boring stuff, give the bottom line! man.. geez.&#8221; Well, you are right, except that this paper IS the bottom line, it&#8217;s dead simple. It&#8217;s almost a tutorial. It is also referenced by the OpenCV documentation.</p>
<p>The method is simple:<br />
- Extract features of choice from training set that contains all classes.<br />
- Create a vocabulary of features by clustering the features (kNN, etc). Let&#8217;s say 1000 features long.<br />
- Train your classifiers (SVMs, Naive-Bayes, boosting, etc) on training set again (preferably a different one), this time check the features in the image for their closest clusters in the vocabulary. Create a histogram of responses for each image to words in the vocabulary, it will be a 1000-entries long vector. Create a sample-label dataset for the training.<br />
- When you get an image you havn&#8217;t seen &#8211; run the classifier and it should, god willing, give you the right class.</p>
<p>Turns out, those crafty guys in WillowGarage have done pretty much all the heavy lifting, so it&#8217;s up for us to pick the fruit of their hard work. OpenCV 2.3 comes packed with a <a href="http://opencv.itseez.com/modules/features2d/doc/object_categorization.html" target="_blank">set of classes</a>, whose names start with BOW for Bag Of Words, that help a lot with implementing this method.</p>
<p>Starting with the first step:</p>
<pre class="brush: plain; title: ; notranslate">
Mat training_descriptors(1,extractor-&gt;descriptorSize(),extractor-&gt;descriptorType());

SurfFeatureDetector detector(400);
vector keypoints;

// computing descriptors
Ptr extractor(
   new OpponentColorDescriptorExtractor(
      Ptr(new SurfDescriptorExtractor())
   )
);

while(..loop a directory? a file?..) {
   Mat img = imread(filepath);
   detector.detect(img, keypoints);
   extractor-&gt;compute(img, keypoints, descriptors);
   training_descriptors.push_back(descriptors);
}
</pre>
<p>Simple!<br />
Let&#8217;s go create a vocabulary then. Luckily, OpenCV has taken care of that, and provide a simple API:</p>
<pre class="brush: plain; title: ; notranslate">
BOWKMeansTrainer bowtrainer(1000); //num clusters
bowtrainer.add(training_descriptors);
Mat vocabulary = bowtrainer.cluster();
</pre>
<p>Boom. Vocabulary.<br />
Now, let&#8217;s train us some SVM classifiers!<br />
We&#8217;re gonna train a 2-class SVM, in a 1-vs-all kind of way. Meaning we train an SVM that can say &#8220;yes&#8221; or &#8220;no&#8221; when choosing between one class and the rest of the classes, hence 1-vs-all.<br />
But first, we need to scour the training set for our histograms (the responses to the vocabulary, remember?):</p>
<pre class="brush: plain; title: ; notranslate">
vector&lt;KeyPoint&gt; keypoints;
Mat response_hist;
Mat img;
string filepath;
map&lt;string,Mat&gt; classes_training_data;

Ptr&lt;FeatureDetector &gt; detector(new SurfFeatureDetector());
Ptr&lt;DescriptorMatcher &gt; matcher(new BruteForceMatcher&lt;L2&lt;float&gt; &gt;());
Ptr&lt;DescriptorExtractor &gt; extractor(new OpponentColorDescriptorExtractor(Ptr&lt;DescriptorExtractor&gt;(new SurfDescriptorExtractor())));
Ptr&lt;BOWImgDescriptorExtractor&gt; bowide(new BOWImgDescriptorExtractor(extractor,matcher));
bowide-&gt;setVocabulary(vocabulary);

#pragma omp parallel for schedule(dynamic,3)
for(..loop a directory?..) {
   img = imread(filepath);
   detector-&gt;detect(img,keypoints);
   bowide.compute(img, keypoints, response_hist);

   #pragma omp critical
   {
      if(classes_training_data.count(class_) == 0) { //not yet created...
         classes_training_data[class_].create(0,response_hist.cols,response_hist.type());
         classes_names.push_back(class_);
      }
      classes_training_data[class_].push_back(response_hist);
   }
   total_samples++;
}
</pre>
<p>Now, two things:<br />
First notice I&#8217;m keeping the training data for each class separately, this is because we will need this for later creating the 1-vs-all samples-labels matrices.<br />
Second, I use OpenMP multi(-threading)processing to make the calculation parallel, and hence faster, on multi-core machines (like the one I used). Time is sliced by a whole lot. OpenMP is a gem, use it more. Just a couple of #pragma directives and you&#8217;re multi-threading.</p>
<p>Alright, data gotten, let&#8217;s get training:</p>
<pre class="brush: plain; title: ; notranslate">
#pragma omp parallel for schedule(dynamic)
for (int i=0;i&lt;classes_names.size();i++) {
   string class_ = classes_names[i];
   cout &lt;&lt; omp_get_thread_num() &lt;&lt; &quot; training class: &quot; &lt;&lt; class_ &lt;&lt; &quot;..&quot; &lt;&lt; endl;

   Mat samples(0,response_cols,response_type);
   Mat labels(0,1,CV_32FC1);

   //copy class samples and label
   cout &lt;&lt; &quot;adding &quot; &lt;&lt; classes_training_data[class_].rows &lt;&lt; &quot; positive&quot; &lt;&lt; endl;
   samples.push_back(classes_training_data[class_]);
   Mat class_label = Mat::ones(classes_training_data[class_].rows, 1, CV_32FC1);
   labels.push_back(class_label);

   //copy rest samples and label
   for (map&lt;string,Mat&gt;::iterator it1 = classes_training_data.begin(); it1 != classes_training_data.end(); ++it1) {
      string not_class_ = (*it1).first;
      if(not_class_.compare(class_)==0) continue; //skip class itself
      samples.push_back(classes_training_data[not_class_]);
      class_label = Mat::zeros(classes_training_data[not_class_].rows, 1, CV_32FC1);
      labels.push_back(class_label);
   }

   cout &lt;&lt; &quot;Train..&quot; &lt;&lt; endl;
   Mat samples_32f; samples.convertTo(samples_32f, CV_32F);
   if(samples.rows == 0) continue; //phantom class?!
   CvSVM classifier;
   classifier.train(samples_32f,labels);

   //do something with the classifier, like saving it to file
}
</pre>
<p>Again, I parallelize, although the process is not too slow.<br />
Note how I build the samples and the labels, where each time I put in the positive samples and mark the labels &#8217;1&#8242;, and then I put the rest of the samples and label them &#8217;0&#8242;.</p>
<p>Moving on to &#8230;. testing the classifiers!<br />
Nothing seems to me like more fun than creating a confusion matrix! Not really, but let&#8217;s see how it&#8217;s done:</p>
<pre class="brush: plain; title: ; notranslate">
map&lt;string,map&lt;string,int&gt; &gt; confusion_matrix; // confusionMatrix[classA][classB] = number_of_times_A_voted_for_B;
map&lt;string,CvSVM&gt; classes_classifiers; //This we created earlier

vector&lt;string&gt; files; //load up with images
vector&lt;string&gt; classes; //load up with the respective classes

for(..loop over a directory?..) {
   Mat img = imread(files[i]),resposne_hist;

   vector&lt;KeyPoint&gt; keypoints;
   detector-&gt;detect(img,keypoints);
   bowide-&gt;compute(img, keypoints, response_hist);

   float minf = FLT_MAX; string minclass;
   for (map&lt;string,CvSVM&gt;::iterator it = classes_classifiers.begin(); it != classes_classifiers.end(); ++it) {
      float res = (*it).second.predict(response_hist,true);
      if (res &lt; minf) {
         minf = res;
         minclass = (*it).first;
      }
   }
   confusion_matrix[minclass][classes[i]]++;
}
</pre>
<p>When you take a look in my files, you will find a much complicated way of doing this. But this is the core idea &#8211; look in the image for the response histogram to the vocabulary of features (rather, feature-cluster-ceneters), run it by all the classifiers  and take the one with the best score. Simple.<br />
Consider making this parallel as well. No reason for it to be serial.</p>
<p>That&#8217;s about covers it.</p>
<h2>Code</h2>
<p>Lately I&#8217;m pushing stuff in Github.com using git rather than SVN on googlecode. Donno why, it&#8217;s just like that.<br />
Get the whole thing at:<br />
<code><a href="https://github.com/royshil/FoodcamClassifier" target="_blank">https://github.com/royshil/FoodcamClassifier</a></code></p>
<p>Follow the build instructions, they&#8217;re a breeze, and then follow the runnning instructions. It&#8217;s basically a series of command-line programs you run to get through each step, and in the end you have like a &#8220;predictor&#8221; service that takes an image and produces a prediction.</p>
<p>OK guys, have fun classifying stuff!<br />
Roy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Simple Kalman filter for tracking using OpenCV 2.2 [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 22:49:30 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[kalman]]></category>
		<category><![CDATA[tracking]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=902</guid>
		<description><![CDATA[Hi, I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman [...]]]></description>
			<content:encoded><![CDATA[<p>Hi,<br />
I wanted to put up a quick note on how to use Kalman Filters in OpenCV 2.2 with the C++ API, because all I could find online was using the old C API. Plus the kalman.cpp example that ships with OpenCV is kind of crappy and really doesn&#8217;t explain how to use the Kalman Filter.<br />
I&#8217;m no expert on Kalman filters though, this is just a quick hack I got going as a test for a project. It worked, so I&#8217;m posting the results.<br />
<span id="more-902"></span></p>
<h2>The Filter</h2>
<p>So I wanted to do a 2D tracker that is more immune to noise. For that I set up a Kalman filter with 4 dynamic parameters and 2 measurement parameters (no control), where my measurement is: 2D location of object, and dynamic is: 2D location and 2D velocity. Pretty simple,  and it makes the transition matrix also simple.</p>
<pre class="brush: plain; title: ; notranslate">
KalmanFilter KF(4, 2, 0);
KF.transitionMatrix = *(Mat_&lt;float&gt;(4, 4) &lt;&lt; 1,0,1,0,   0,1,0,1,  0,0,1,0,  0,0,0,1);
Mat_&lt;float&gt; measurement(2,1); measurement.setTo(Scalar(0));

// init...
KF.statePre.at&lt;float&gt;(0) = mouse_info.x;
KF.statePre.at&lt;float&gt;(1) = mouse_info.y;
KF.statePre.at&lt;float&gt;(2) = 0;
KF.statePre.at&lt;float&gt;(3) = 0;
setIdentity(KF.measurementMatrix);
setIdentity(KF.processNoiseCov, Scalar::all(1e-4));
setIdentity(KF.measurementNoiseCov, Scalar::all(1e-1));
setIdentity(KF.errorCovPost, Scalar::all(.1));
</pre>
<p>Cool, moving on to the dynamic part.<br />
So I set up a mouse callback to get the mouse position every &#8220;frame&#8221; (a 100ms wait), and feed that into the filter:</p>
<pre class="brush: plain; title: ; notranslate">
// First predict, to update the internal statePre variable
Mat prediction = KF.predict();
Point predictPt(prediction.at&lt;float&gt;(0),prediction.at&lt;float&gt;(1));

// Get mouse point
measurement(0) = mouse_info.x;
measurement(1) = mouse_info.y;

Point measPt(measurement(0),measurement(1));

// The &quot;correct&quot; phase that is going to use the predicted value and our measurement
Mat estimated = KF.correct(measurement);
Point statePt(estimated.at&lt;float&gt;(0),estimated.at&lt;float&gt;(1));
</pre>
<p>All the rest is garnish (see the code)..</p>
<p>The important bit is to see that Predict() happens before Correct(). This is according to the excellent <a href="http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf">Kalman filter tutorial</a> I found. Look carefully at Figure 1-2!! It will sort you out. Also take a look at <a href="https://code.ros.org/svn/opencv/trunk/opencv/modules/video/src/kalman.cpp">OpenCV&#8217;s internal impl of Kalman</a>, see that it follows these steps closely. Especially <code> Mat&#038; KalmanFilter::predict(const Mat&#038; control)</code> and <code>Mat&#038; KalmanFilter::correct(const Mat&#038; measurement)</code>.<br />
Another good place I found that helped me formulate the parameters for the filter is <a href="http://www.marcad.com/cs584/Tracking.html">this place</a>. Again, take everything with a grain of salt, because Kalman Filters are very versatile you just need to know how to formulate them right.</p>
<h2>Result</h2>
<p>Using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.39.24-PM.png" alt="" title="kalman using velocity" width="580" height="602" class="alignnone size-full wp-image-907" /></a></p>
<p>Not using velocity:<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" rel="lightbox[902]"><img src="http://www.morethantechnical.com/wp-content/uploads/2011/06/Screen-shot-2011-06-16-at-6.41.24-PM.png" alt="" title="kalman not using velocity" width="580" height="602" class="alignnone size-full wp-image-908" /></a></p>
<p>Some Video<br />
<iframe width="425" height="349" src="http://www.youtube.com/embed/SxtY1jQJ2fc" frameborder="0" allowfullscreen></iframe></p>
<h2>Code</h2>
<p>As usual, grab the code off the SVN:</p>
<pre class="brush: plain; title: ; notranslate">
svn co http://morethantechnical.googlecode.com/svn/trunk/mouse_kalman/main.cpp
</pre>
<p>Enjoy,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F06%2F17%2Fsimple-kalman-filter-for-tracking-using-opencv-2-2-w-code%2F&amp;title=Simple%20Kalman%20filter%20for%20tracking%20using%20OpenCV%202.2%20%5Bw%2F%20code%5D" id="wpa2a_6"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/06/17/simple-kalman-filter-for-tracking-using-opencv-2-2-w-code/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Neat OpenCV smoothing trick when Kineacking (Kinect Hacking) [w/ code]</title>
		<link>http://www.morethantechnical.com/2011/03/05/neat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code/</link>
		<comments>http://www.morethantechnical.com/2011/03/05/neat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code/#comments</comments>
		<pubDate>Sat, 05 Mar 2011 20:57:26 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[depth]]></category>
		<category><![CDATA[inpainting]]></category>
		<category><![CDATA[kinect]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=824</guid>
		<description><![CDATA[I found a nice little trick to ease the work with the very noisy depth image the Kinect is giving out. The image is filled with these &#8220;blank&#8221; values that basically note where the data is unreadable. The secret is to use inpainting to cover these areas and get a cleaner image. And as always, [...]]]></description>
			<content:encoded><![CDATA[<p>I found a nice little trick to ease the work with the very noisy depth image the Kinect is giving out. The image is filled with these &#8220;blank&#8221; values that basically note where the data is unreadable. The secret is to use inpainting to cover these areas and get a cleaner image. And as always, no need to dig deep &#8211; OpenCV has it all included.<br />
<span id="more-824"></span></p>
<p>Start from a simple Kinect frames feed from <a href="http://openkinect.org/wiki/C%2B%2BOpenCvExample">here</a>:</p>
<pre class="brush: plain; title: ; notranslate">

int main(int argc, char **argv) {
	bool die(false);

	Mat depthMat(Size(640,480),CV_16UC1);
	Mat depthf  (Size(640,480),CV_8UC1);
	Mat rgbMat(Size(640,480),CV_8UC3,Scalar(0));
	Mat ownMat(Size(640,480),CV_8UC3,Scalar(0));

        Freenect::Freenect freenect;
        MyFreenectDevice&amp; device = freenect.createDevice&lt;MyFreenectDevice&gt;(0);

	device.startVideo();
	device.startDepth();

    while (!die) {
    	device.getVideo(rgbMat);
    	device.getDepth(depthMat);
    	depthMat.convertTo(depthf, CV_8UC1, 255.0/2048.0);
        cv::imshow(&quot;depth&quot;,depthf);
		char k = cvWaitKey(5);
		if( k == 27 ){
			break;
		}
    }

   	device.stopVideo();
	device.stopDepth();
	return 0;
}
</pre>
<p>Now let&#8217;s stretch the signal a little bit and add the inpainting:</p>
<pre class="brush: plain; title: ; notranslate">
		//interpolation &amp; inpainting
		{
			Mat _tmp,_tmp1; //minimum observed value is ~440. so shift a bit
			Mat(depthMat - 400.0).convertTo(_tmp1,CV_64FC1);

			Point minLoc; double minval,maxval;
			minMaxLoc(_tmp1, &amp;minval, &amp;maxval, NULL, NULL);
			_tmp1.convertTo(depthf, CV_8UC1, 255.0/maxval);  //linear interpolation

                       //use a smaller version of the image
			Mat small_depthf; resize(depthf,small_depthf,Size(),0.2,0.2);
                        //inpaint only the &quot;unknown&quot; pixels
			cv::inpaint(small_depthf,(small_depthf == 255),_tmp1,5.0,INPAINT_TELEA);

			resize(_tmp1, _tmp, depthf.size());
			_tmp.copyTo(depthf, (depthf == 255));  //add the original signal back over the inpaint
		}
</pre>
<p>Note that I&#8217;m using a small copy of the image, because inpainting is a heavy computation, and it works best on low frequencies. I copy back the original signal over the up-sized inpainted one to retain high frequencies.</p>
<p>It works pretty well!<br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/Jm8yflH5BDs?hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Jm8yflH5BDs?hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2011%2F03%2F05%2Fneat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code%2F&amp;title=Neat%20OpenCV%20smoothing%20trick%20when%20Kineacking%20%28Kinect%20Hacking%29%20%5Bw%2F%20code%5D" id="wpa2a_8"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2011/03/05/neat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hand gesture recognition via model fitting in energy minimization w/OpenCV</title>
		<link>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/</link>
		<comments>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 22:11:12 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[computer vision]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=762</guid>
		<description><![CDATA[Hi Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by this ancient work. The problem is setting out to find a good solution, and everything is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/hands-300x248.png" alt="hands with model fitted" title="hands with model fitted" width="300" height="248" class="aligncenter size-medium wp-image-796" /></a>Hi</p>
<p>Just wanted to share a thing I made &#8211; a simple 2D hand pose estimator, using a skeleton model fitting. Basically there has been a crap load of work on hand pose estimation, but I was inspired by <a href="http://scholar.google.com/scholar?cluster=136383770354228708&#038;hl=en&#038;as_sdt=40000000">this ancient work</a>. The problem is setting out to find a good solution, and everything is very hard to understand and implement. In such cases I like to be inspired by a method, and just set out with my own implementation. This way, I understand whats going on, simplify it, and share it with you!</p>
<p>Anyway, let&#8217;s get down to business.<br />
<span id="more-762"></span></p>
<h1>A bit about energy minimization problems</h1>
<p>A dear friend revealed before me the wonders of energy minimization problems a while back, and ever since I have trying to find uses for that method. Basically, it is trying to find a global minimum for a complicated energy function (usually with many parameters), by following the function&#8217;s gradient. Such methods are often called <a href="http://en.wikipedia.org/wiki/Gradient_descent">Gradient Descent</a>, and used mostly for non-linear systems that can&#8217;t be solved easily using a least-squares variant. </p>
<p>A lot of work in computer vision was done using energy functions (I believe the most seminal was <a href="http://scholar.google.com/scholar?cluster=10809837120977085662&#038;hl=en&#038;as_sdt=40000000">Snakes</a>, over 10,000 citations), usually having two terms: Internal energy and External energy. The equilibrium between the two terms should result in a low-energy system &#8211; our optimal result. So we would like to formulate the terms in our system such that when they are 0 &#8211; they describe the system as we want it.</p>
<p>Following the works with active contours, I believe the external energy function should have to do with how the hand model fits to the hand blob, and the internal energy will have to do with how &#8220;comfortable&#8221; the hand is with this configuration.</p>
<h1>The hand model</h1>
<p>Let&#8217;s see how a 2D model of a hand might look like<br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" rel="lightbox[762]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/12/Screen-shot-2010-12-25-at-10.50.41-AM.png" alt="" title="Screen shot 2010-12-25 at 10.50.41 AM" width="232" height="231" class="aligncenter size-full wp-image-790" /></a><br />
Kinda looks like a rake&#8230; huh?</p>
<p>There are some parts that practically can&#8217;t change much, i.e the palm (orange), and some that might change drastically, i.e the fingers (red). Each finger has joints (blue circle), and a tip (bigger blue circle).</p>
<pre class="brush: plain; title: ; notranslate">
typedef struct finger_data {
	Point2d origin_offset;		//base or finger relative to center hand
	double a;					//angle
	vector&lt;double&gt; joints_a;	//angles of joints
	vector&lt;double&gt; joints_d;	//bone length
} FINGER_DATA;

typedef struct hand_data {
	FINGER_DATA fingers[5];		//fingers
	double a;					//angle of whole hand
	Point2d origin;				//center of palm
	Point2d origin_offset;		//offset from center for optimization
	double size;				//relative size of hand = length of a finger
} HAND_DATA;
</pre>
<p>At first I thought, since I&#8217;m only interested in the tips of the fingers, to use Inverse Kinematics to guide the tips to a certain point and let the joints find their own minimal energy position, following <a href="http://freespace.virgin.net/hugo.elias/models/m_ik2.htm">this</a> article. But I abandoned this method because of complications. </p>
<p>I also had to simplify this model, for real-time estimation and also better results. So in the end I ended up with a very rigid model, that allows only on joint per finger and no angular movement.</p>
<h1>Using tnc.c</h1>
<p>tnc.c is a &#8220;library&#8221;, essentially one c file, that implements a line search algorithm that is able to find the minimum point of a multi-variate function. I&#8217;m not certain of the algorithm details, and it&#8217;s not so important as it can be replaced with any other similar library. But, tnc.c has a great advantage &#8211; it is dead simple. One function will start the gradient decent, calling-back a function to calculate the gradients.</p>
<p>So basically I had to write just one very short function:</p>
<pre class="brush: plain; title: ; notranslate">
static int my_f(double x[], double *f, double g[], void *state) {
	DATA_FOR_TNC* d_ptr = (DATA_FOR_TNC*)state;
	DATA_FOR_TNC new_data = *d_ptr;

	mapVecToData(x,new_data.hand);

	*f = calc_Energy(new_data,*d_ptr);

	//calc gradients
	{
		double _x[SIZE_OF_HAND_DATA];

		for(int i=0;i&lt;SIZE_OF_HAND_DATA;i++) {
			memcpy(_x, x, sizeof(double)*SIZE_OF_HAND_DATA); //reset variables
			_x[i] = _x[i] + EPSILON; //change only one variable
			mapVecToData(_x, new_data.hand);
			double E_epsilon = calc_Energy(new_data,*d_ptr);
			g[i] = ((E_epsilon - *f) / EPSILON); //calc the gradient for this variable change
		}
	}

	return 0;
}
</pre>
<p>This function is called by tnc.c on every iteration of the search, the <code>double x[]</code> is the state of variables the search is now examining, <code>double* f</code> is the energy for this state, <code>double g[]</code> are the gradients (same size as x[]), and <code>voide* state</code> is a user-defined variable that can be carried along the process.</p>
<p>So what I did is simply changed the value of each parameter in turn, to test how it effects the energy in the system. I get a measure of the energy, then I subtract it from the &#8220;natural&#8221; setup (without any changes to parameters) energy measure, and I get the gradient for this parameter.</p>
<p>The energy function came out a bit different in the end:</p>
<pre class="brush: plain; title: ; notranslate">

static double calc_Energy(DATA_FOR_TNC&amp; d, DATA_FOR_TNC&amp; orig_d) {
	double _sum = 0.0;

	//external energy: how close are the joints to the hand blob? (how well do they fit to it)
	vector&lt;Point2d&gt; joints;
	Mat tips(5,1,CV_64FC2);

	for (int j=0; j&lt;5; j++) {
		joints.clear();
		FINGER_DATA f = d.hand.fingers[j];
		Point2d _newTip = newTip(f,d.hand,joints); //get joints for this finger

		for (int i=0; i&lt;tmp.size(); i++) { //for each joint find how far it is from the blob
			double ds = pointPolygonTest(d.contour, tmp[i]+getHandOrigin(d.hand), true);
			ds += 5;
			ds = 1 * ((ds &lt; 0) ? -1 : 1) * (ds*ds) ;
			_sum -= (ds &gt; 0) ? 0 : 100*ds;
		}

		tips.at&lt;Point2d&gt;(j,0) = _newTip;
	}

	//lazyness of fingers - joints should strive to be as they were in the natural pose
	vector&lt;double&gt; _angles;
//	for (int j=0; j&lt;5; j++) {
//		FINGER_DATA f = d.hand.fingers[j];
//		FINGER_DATA of = orig_d.hand.fingers[j];
////		_angles.push_back(f.a - of.a);
//		for (int i=0; i&lt;f.joints_d.size(); i++) {
////			_angles.push_back(f.joints_a[i] - of.joints_a[i]);
//			_angles.push_back(f.joints_d[i] - of.joints_d[i]);
//		}
//	}
	_angles.push_back(d.hand.a-orig_d.hand.a); //the angle of the hand should be as it was before
	_sum  += 10000*norm(Mat(_angles));

	if(_sum &lt; 0) return 0;
	return _sum;
}
</pre>
<p>You&#8217;ll notice the commented out section. The &#8220;laziness of fingers&#8221; turned out not to give good results&#8230; A different metric is needed! I have not found it yet, maybe you have a good idea?</p>
<p>Starting tnc.c is very simple: Allocating the vectors for X and gradients, initializing the model from the blob, and calling the <code>simple_tnc</code> convenience method. <code>simple_tnc</code> starts <code>tnc</code> with some default parameters that don&#8217;t affect the outcome (at least in my tries).</p>
<pre class="brush: plain; title: ; notranslate">
void estimateHand(Mat&amp; mymask) {
	double _x[SIZE_OF_HAND_DATA] = {0};
	Mat X(1,SIZE_OF_HAND_DATA,CV_64FC1,_x);
	double f;
	Mat gradients(Size(SIZE_OF_HAND_DATA,1),CV_64FC1,Scalar(0));

	namedWindow(&quot;state&quot;);

	initialize_hand_data(d, mymask);

	mapDataToVec((double*)X.data, d.hand);

	simple_tnc(SIZE_OF_HAND_DATA, (double*)X.data, &amp;f, (double*)gradients.data, my_f, (void*)&amp;d, 1, 0);

	mapVecToData((double*)X.data, d.hand);
	showstate(d,1);

	d.hand.origin = getHandOrigin(d.hand); //move to new position
}
</pre>
<h1>Results and Discussion</h1>
<p>Here are my results so far:<br />
<object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/uETHJQhK144?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>It&#8217;s not perfect, but it&#8217;s a start. Tracking and estimating open hand is pretty good, with some orientation change as well. But when the fingers are closed&#8230; that&#8217;s where problems start. </p>
<p>Sometimes the joints &#8220;hover&#8221; over the black area to &#8220;land&#8221; in a white area so they &#8220;fit&#8221;, but they should not do that. One easy thing to do to counter this is to measure the distance of the whole bone, and not just the joint.</p>
<p>The model right now doesn&#8217;t use all the joints possible, because it is too heavy computationally. Plus the energy does not depend (or change) the angle of the fingers. So this is a very very simple model of a hand&#8230;</p>
<p>But, it is a good start! All the <a href="http://www.youtube.com/watch?v=mLT4CFLIi8A&#038;feature=related">other</a> <a href="http://www.youtube.com/watch?v=6Uw_8Y1RuQQ&#038;feature=related">stuff</a> I <a href="http://www.youtube.com/watch?v=B_UYmQJT-F0&#038;feature=related">have</a> <a href="http://www.youtube.com/watch?v=F8GVeV0dYLM&#038;feature=related">seen</a> <a href="http://www.youtube.com/watch?v=Rmh-mZFxWns&#038;feature=related">online</a> is just basic high-curvature points counting and color-based or feature-based segmentation and tracking&#8230; My model actually tries to fit an articulate and precise model of a hand to the image.</p>
<h1>How did you get such nice blobs?!</h1>
<p>You ask. They are beautiful aren&#8217;t they&#8230; nice and clean, easy for tracking and model fitting. It&#8217;s no magic though&#8230;<br />
Well, I took part of a <a href="http://depthjs.media.mit.edu/">project in the Media Lab, called DepthJS</a>, that uses the MS Kinect to control web pages. I wrote the computer-vision part. So all the <a href="https://github.com/doug/depthjs">code is there</a>, you can grab it, I just plugged it into this little project. Basing off <a href="http://openkinect.org/wiki/C%2B%2BOpenCvExample">this very simple example of using OpenCV2.X and libfreenect</a>.</p>
<p>Wow, this was a longie.. I hope you learned something and got inspired. I got to do a second overview of the project, and I&#8217;m inspired. Inspiration all around!</p>
<p>Code is obviously yours for the taking:<br />
<a href="https://github.com/royshil/OpenHPE">https://github.com/royshil/OpenHPE</a></p>
<p>Please contribute your own views, thoughts, code, rants in the comments and github page.</p>
<p>Enjoy<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F12%2F28%2Fhand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv%2F&amp;title=Hand%20gesture%20recognition%20via%20model%20fitting%20in%20energy%20minimization%20w%2FOpenCV" id="wpa2a_10"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/12/28/hand-gesture-recognition-via-model-fitting-in-energy-minimization-wopencv/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Kinect and OpenCV 2.1</title>
		<link>http://www.morethantechnical.com/2010/11/22/kinect-and-opencv-2-1/</link>
		<comments>http://www.morethantechnical.com/2010/11/22/kinect-and-opencv-2-1/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 07:36:31 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[kinect]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=744</guid>
		<description><![CDATA[Hi Another quicky on how to use Kinect (libfreenect) with OpenCV 2.1. I already saw people do it, but havn&#8217;t seen code. UPDATE (12/29): OpenKinect posted very good C++ code of using libfreenect with OpenCV2.X APIs: here it is. Plus, their git repo now has a very clean C code: here it is. So here [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/11/Screen-shot-2010-11-22-at-2.35.00-AM.png" rel="lightbox[744]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/11/Screen-shot-2010-11-22-at-2.35.00-AM-300x117.png" alt="" title="Screen shot" width="300" height="117" class="alignleft size-medium wp-image-759" /></a>Hi</p>
<p>Another quicky on how to use Kinect (libfreenect) with OpenCV 2.1. I already saw people do it, but havn&#8217;t seen code.</p>
<p><strong>UPDATE (12/29)</strong>: OpenKinect posted very good C++ code of using libfreenect with OpenCV2.X APIs: <a href="http://openkinect.org/wiki/C%2B%2BOpenCvExample">here it is</a>. Plus, their git repo now has a very clean C code: <a href="https://github.com/OpenKinect/libfreenect/blob/master/wrappers/opencv/cvdemo.c">here it is</a>.</p>
<p>So here it goes<br />
<span id="more-744"></span><br />
Before I started, I got libfreenect off the OpenKinect git repo: <code>https://github.com/OpenKinect/libfreenect</code><br />
Which comes bundeled with &#8220;glview.c&#8221;, an example of how to use it with OpenGL, that I take some stuff from.<br />
So get past the hurdle of compiling, and if glview runs &#8211; you&#8217;re home free. Play around and familiarize yourself with the code, this is going to be fast&#8230;</p>
<p>The only tricky thing is that OpenCV wants the &#8220;imshow&#8221;s to be on the main thread. This I learned empirically, form trial and error. But, in the glview.c example they do the &#8220;freenect_process_events&#8221; on the main thread, and the rendering is done on the GL thread. I flipped things around a bit.</p>
<p>So in the main function I have the initialization stuff and a pthread_create to make a freenect thread:</p>
<pre class="brush: plain; title: ; notranslate">

Mat depthMat(Size(640,480),CV_16UC1),
	rgbMat(Size(640,480),CV_8UC3,Scalar(0));
pthread_t fnkt_thread;
freenect_device *f_dev;
pthread_mutex_t buf_mutex = PTHREAD_MUTEX_INITIALIZER;
freenect_context *f_ctx;
pthread_cond_t frame_cond = PTHREAD_COND_INITIALIZER;

int main(int argc, char **argv)
{
	int res;

	g_argc = argc;
	g_argv = argv;

	if (freenect_init(&amp;f_ctx, NULL) &lt; 0) {
		printf(&quot;freenect_init() failed\n&quot;);
		return 1;
	}

	freenect_set_log_level(f_ctx, FREENECT_LOG_INFO);

	int nr_devices = freenect_num_devices (f_ctx);
	printf (&quot;Number of devices found: %d\n&quot;, nr_devices);

	int user_device_number = 0;
	if (argc &gt; 1)
		user_device_number = atoi(argv[1]);

	if (nr_devices &lt; 1)
		return 1;

	if (freenect_open_device(f_ctx, &amp;f_dev, user_device_number) &lt; 0) {
		printf(&quot;Could not open device\n&quot;);
		return 1;
	}

	freenect_set_tilt_degs(f_dev,freenect_angle);
	freenect_set_led(f_dev,LED_RED);
	freenect_set_depth_callback(f_dev, depth_cb);
	freenect_set_rgb_callback(f_dev, rgb_cb);
	freenect_set_rgb_format(f_dev, FREENECT_FORMAT_RGB);
	freenect_set_depth_format(f_dev, FREENECT_FORMAT_11_BIT);

	freenect_start_depth(f_dev);
	freenect_start_rgb(f_dev);

	res = pthread_create(&amp;fnkt_thread, NULL, freenect_threadfunc, NULL);
	if (res) {
		printf(&quot;pthread_create failed\n&quot;);
		return 1;
	}

	while (!die) {
		fr++;

		imshow(&quot;rgb&quot;, rgbMat);
		depthMat.convertTo(depthf, CV_8UC1, 255.0/2048.0);
		imshow(&quot;depth&quot;,depthf);			

                 char k = cvWaitKey(5);
                 if( k == 27 ) break;
         }

	printf(&quot;-- done!\n&quot;);

	destroyWindow(&quot;rgb&quot;);
	destroyWindow(&quot;depth&quot;);

	pthread_join(fnkt_thread, NULL);
	pthread_exit(NULL);
}
</pre>
<p>The freenect thread is simply</p>
<pre class="brush: plain; title: ; notranslate">
void *freenect_threadfunc(void* arg) {
	cout &lt;&lt; &quot;freenect thread&quot;&lt;&lt;endl;
	while(!die &amp;&amp; freenect_process_events(f_ctx) &gt;= 0 ) {}
	cout &lt;&lt; &quot;freenect die&quot;&lt;&lt;endl;
	return NULL;
}
</pre>
<p>And the two callbacks need also to write into the OpenCV buffers:</p>
<pre class="brush: plain; title: ; notranslate">
void depth_cb(freenect_device *dev, freenect_depth *depth, uint32_t timestamp)
{
	pthread_mutex_lock(&amp;buf_mutex);

	//copy to ocv buf...
	memcpy(depthMat.data, depth, FREENECT_DEPTH_SIZE);

	got_frames++;
	pthread_cond_signal(&amp;frame_cond);
	pthread_mutex_unlock(&amp;buf_mutex);
}

void rgb_cb(freenect_device *dev, freenect_pixel *rgb, uint32_t timestamp)
{
	pthread_mutex_lock(&amp;buf_mutex);
	got_frames++;
	//copy to ocv_buf..
	memcpy(rgbMat.data, rgb, FREENECT_RGB_SIZE);

	pthread_cond_signal(&amp;frame_cond);
	pthread_mutex_unlock(&amp;buf_mutex);
}
</pre>
<p>And that&#8217;s all you&#8217;ll need.</p>
<p>Now go crazy with your concoction of computer vision algorithms!</p>
<p>Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F11%2F22%2Fkinect-and-opencv-2-1%2F&amp;title=Kinect%20and%20OpenCV%202.1" id="wpa2a_12"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/11/22/kinect-and-opencv-2-1/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>20-lines AR in OpenCV [w/code]</title>
		<link>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/#comments</comments>
		<pubDate>Wed, 10 Nov 2010 15:00:30 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[computer vision]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=732</guid>
		<description><![CDATA[Hi, Just wanted to share a bit of code using OpenCV&#8217;s camera extrinsic parameters recovery, camera position and rotation &#8211; solvePnP (or it&#8217;s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code. [...]]]></description>
			<content:encoded><![CDATA[<p>Hi,</p>
<p>Just wanted to share a bit of code using OpenCV&#8217;s camera extrinsic parameters recovery, camera position and rotation &#8211; solvePnP (or it&#8217;s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code.<br />
This can serve as a primer, or tutorial on how to use OpenCV with OpenGL for AR.</p>
<p>The program is just a straightforward optical flow based tracking, fed manually with four points which are the planar object&#8217;s corners, and solving camera-pose every frame. Plain vanilla AR.</p>
<p>Well the whole cpp file is ~350 lines, but there will only be 20 or less <strong>interesting</strong> lines&#8230; Actually much less. Let&#8217;s see what&#8217;s up<br />
<span id="more-732"></span><br />
I wanna run you through the code really quickly and not go into much detail, to keep thing simple. So first of all, we should have two separate threads: Vision and Graphics. The vision thread will track and solve, and the graphics thread will display. </p>
<h2>Initialize</h2>
<pre class="brush: plain; title: ; notranslate">
int main(int argc, char** argv) {
	initGL(argc,argv);
	initOCV(NULL);

	pthread_t tId;
	pthread_attr_t tAttr;
	pthread_attr_init(&amp;tAttr);
	pthread_create(&amp;tId, &amp;tAttr, startOCV, NULL);

	startGL(NULL);
}
</pre>
<p>The initGL, initOCV functions just initialize stuff that can&#8217;t be initialized statically, like GLUT window definitions, some starting values for the cam-pose estimation and other boring stuff. </p>
<p>GLUT will run off the main thread, it seems putting it on its own thread makes it unhappy and not work.</p>
<h2>Tracking</h2>
<p>I&#8217;m using the simplest form of optical flow in OpenCV (LK Pyramid), and the code is equally very minimal..</p>
<pre class="brush: plain; title: ; notranslate">
void* startOCV(void* arg) {
	while (1) {
		cvtColor(img, prev, CV_BGR2GRAY);

		//get frame off camera
		cap &gt;&gt; frame;
		if(frame.data == NULL) break;

		frame.copyTo(img);

		cvtColor(img, next, CV_BGR2GRAY);

		//calc optical flow
		calcOpticalFlowPyrLK(prev, next, points1, points2, status, err, Size(30,30));
		cvtPtoKpts(imgPointsOnPlane, points2);

		//switch points vectors (next becomes previous)
		points1.clear();
		points1 = points2;

		//calculate camera pose
		getPlanarSurface(points1);

		//refresh 3D scene
		glutPostWindowRedisplay(glutwin);

		//show tracked points on scene
		drawKeypoints(next, imgPointsOnPlane, img_to_show, Scalar(255));
		imshow(&quot;main2&quot;, img_to_show);
		int c = waitKey(30);
		if (c == ' ') {
			waitKey(0);
		}
	}
	return NULL;
}
</pre>
<p>To use OpenCV&#8217;s &#8216;drawKeypoints&#8217;, which makes drawing key points much easier, we must use <code>vector&lt;KeyPoint&gt;</code>. So I created these 2 very simple converter funcs: cvtKeyPtoP and cvtPtoKpts.</p>
<p>You think &#8216;getPlanarSurface&#8217; is complicated? think again! 3 lines:</p>
<pre class="brush: plain; title: ; notranslate">
void getPlanarSurface(vector&lt;Point2f&gt;&amp; imgP) {
	Rodrigues(rotM,rvec);

	solvePnP(objPM, Mat(imgP), camera_matrix, distortion_coefficients, rvec, tvec, true);

	Rodrigues(rvec,rotM);
}
</pre>
<p><strong>Booya</strong>! Vision stuff is done.</p>
<h2>3D Graphics</h2>
<p>A little 3D never hurt any AR system&#8230; But drawing it is very simple still:</p>
<pre class="brush: plain; title: ; notranslate">
void display(void)
{
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	//Make sure we have a background image buffer
	if(img_to_show.data != NULL) {
		Mat tmp; 

		//Switch to Ortho for drawing background
		glMatrixMode(GL_PROJECTION);
		glPushMatrix();
		gluOrtho2D(0.0, 0.0, 640.0, 480.0);

		glMatrixMode(GL_MODELVIEW);

		//Textures can only have power-of-two dimensions, so closest to 640x480 is 1024x512
		tmp = Mat(Size(1024,512),CV_8UC3);
		//However we are going to use only a portion, so create an ROI
		Mat ttmp = tmp(Range(0,img_to_show.rows),Range(0,img_to_show.cols));

		//Some frames could be 8bit grayscale, so make sure on the output we always get 24bit RGB.
		if(img_to_show.step == img_to_show.cols)
			cvtColor(img_to_show, ttmp, CV_GRAY2RGB);
		else if(img_to_show.step == img_to_show.cols * 3)
			cvtColor(img_to_show, ttmp, CV_BGR2RGB);
		flip(ttmp,ttmp,0);

		glEnable(GL_TEXTURE_2D);
		glTexImage2D(GL_TEXTURE_2D, 0, 3, 1024, 512, 0, GL_RGB, GL_UNSIGNED_BYTE, tmp.data);

		//Finally, draw the texture using a simple quad with texture coords in corners.
		glPushMatrix();
		glTranslated(-320.0, -240.0, -500.0);//why these parameters?!
		glBegin(GL_QUADS);
		glTexCoord2i(0, 0); glVertex2i(0, 0);
		glTexCoord2i(1, 0); glVertex2i(640, 0);
		glTexCoord2i(1, 1); glVertex2i(640, 480);
		glTexCoord2i(0, 1); glVertex2i(0, 480);
		glEnd();
		glPopMatrix();

		glMatrixMode(GL_PROJECTION);
		glPopMatrix();
		glMatrixMode(GL_MODELVIEW);
	}

	glPushMatrix();
	double m[16] = {	_d[0],-_d[3],-_d[6],0,
						_d[1],-_d[4],-_d[7],0,
						_d[2],-_d[5],-_d[8],0,
						tv[0],-tv[1],-tv[2],1	};

	//Rotate and translate according to result from solvePnP
	glLoadMatrixd(m);

	//Draw a basic cube
	glDisable(GL_TEXTURE_2D);
	glColor3b(255, 0, 0);
	glutSolidCube(1);
	glPopMatrix();

	glutSwapBuffers();
}
</pre>
<p>Not so horrific, huh? Most of it is drawing the background texture, and that&#8217;s only trying to avoid using glDrawPixels&#8230; The only interesting thing is loading the rotation and translation matrix.<br />
However you will notice the tv[0] (x axis component of translation) doesn&#8217;t have a minus sign, that&#8217;s because OpenCV&#8217;s solvePnP assumes looking down the -z axis, while OpenGL assumes looking up the +z axis (so a 180 rotation around the x axis is needed). Same goes for _d[0] _d[1] and _d[2].<br />
OpenGL in fact is defaulting to the camera looking down the -y axis, where the z axis is facing up (z is elevation). But in initGL I initialized OpenGL to look &#8220;normally&#8221; down the -z axis where +x goes right and +y goes up.</p>
<h2>Proof time</h2>
<p>Not that you need it.. <img src='http://www.morethantechnical.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But here&#8217;s a video of it working.<br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/OxBa_5HvZyI?hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/OxBa_5HvZyI?hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p>BTW: If anyone can solve the problem of the slight misalignment of the 3D and image &#8211; let me know.</p>
<h2>Code and Salutations</h2>
<p>Code can be downloaded from blog&#8217;s SVN:</p>
<pre class="brush: plain; title: ; notranslate">svn checkout http://morethantechnical.googlecode.com/svn/trunk/OpenCVAR morethantechnical-OpenCVAR</pre>
<p>Now let your imagination run wild!</p>
<p>Farewell,<br />
Roy.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F11%2F10%2F20-lines-ar-in-opencv-wcode%2F&amp;title=20-lines%20AR%20in%20OpenCV%20%5Bw%2Fcode%5D" id="wpa2a_14"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

