Categories

# Quick and Easy Head Pose Estimation with OpenCV [w/ code] Hi
Just wanted to share a small thing I did with OpenCV – Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.
I implemented a very quick & dirty solution based on OpenCV’s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method – solvePnP – that does this, given some calibration parameters that I completely disregarded.
Here’s how it’s done

## Intro

I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It’s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.
So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:

1. Left ear
2. Right ear
3. Left eye
4. Right eye
5. Nose tip
6. Left mouth corner
7. Right mouth corner

Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina’s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.

## Time to Code

First I initialize the 3D points vector, and a dummy camera matrix:

```vector<Point3f > modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear
op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d = {1,0,0,
0,-1,0,
0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv=0;tv=0;tv=1;
tvec = Mat(tv);
double _cm = { 20, 0, 160,
0, 20, 120,
0,  0,   1 };  //"calibration matrix": center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);
```

Even though the “calibration” parameters are totally bogus they work pretty good.
Now, we’re all ready to start estimating some poses. So let’s use solvePnP:

```vector<Point2f > imagePoints;
FILE* f;
fopen_s(&f,"points.txt","r");
for(int i=0;i<7;i++) {
int x,y;
fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y);
imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);</td>
//make a Mat of the vector<>
Mat ip(imagePoints);
//display points on image
for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);
//"distortion coefficients"... hah!
double _dc[] = {0,0,0,0};
//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);
//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr<double>();
printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n",
_r,_r,_r,_r,_r,_r,_r,_r,_r);
```

Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:

```glMatrixMode(GL_MODELVIEW);
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();
//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d = {       rot,rot,rot,0,
rot,rot,rot,0,
rot,rot,rot,0,
0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera
//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
glPopMatrix();
glPopMatrix();
//----------End axes --------------
```

That wasn’t too hard, huh? Awesome.

## Code

You can grab the code from the SVN repo:

```svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose
```

Enjoy!
Roy.