Categories
3d code graphics opencv opengl programming Recommended vision Website

Quick and Easy Head Pose Estimation with OpenCV [w/ code]


Update: check out my new post about this https://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/
Hi
Just wanted to share a small thing I did with OpenCV – Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.
I implemented a very quick & dirty solution based on OpenCV’s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method – solvePnP – that does this, given some calibration parameters that I completely disregarded.
Here’s how it’s done

Intro

I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It’s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.
So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:

  1. Left ear
  2. Right ear
  3. Left eye
  4. Right eye
  5. Nose tip
  6. Left mouth corner
  7. Right mouth corner

Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina’s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.

Time to Code

First I initialize the 3D points vector, and a dummy camera matrix:

vector<Point3f > modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear
op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //"calibration matrix": center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);

Even though the “calibration” parameters are totally bogus they work pretty good.
Now, we’re all ready to start estimating some poses. So let’s use solvePnP:

vector<Point2f > imagePoints;
//read 2D points from file...
FILE* f;
fopen_s(&f,"points.txt","r");
for(int i=0;i<7;i++) {
     int x,y;
     fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);</td>
//make a Mat of the vector<>
Mat ip(imagePoints);
//display points on image
Mat img = imread("image.png");
for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);
//"distortion coefficients"... hah!
double _dc[] = {0,0,0,0};
//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);
//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr<double>();
printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n",
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);

Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();
//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera
//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
glPopMatrix();
glPopMatrix();
//----------End axes --------------

That wasn’t too hard, huh? Awesome.

So…. Results

Code

You can grab the code from the SVN repo:

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose

Enjoy!
Roy.