Categories
3d code graphics opencv opengl programming Recommended vision Website

Quick and Easy Head Pose Estimation with OpenCV [w/ code]


Update: check out my new post about this https://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/
Hi
Just wanted to share a small thing I did with OpenCV – Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.
I implemented a very quick & dirty solution based on OpenCV’s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method – solvePnP – that does this, given some calibration parameters that I completely disregarded.
Here’s how it’s done

Intro

I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It’s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.
So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:

  1. Left ear
  2. Right ear
  3. Left eye
  4. Right eye
  5. Nose tip
  6. Left mouth corner
  7. Right mouth corner

Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina’s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.

Time to Code

First I initialize the 3D points vector, and a dummy camera matrix:

vector<Point3f > modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear
op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //"calibration matrix": center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);

Even though the “calibration” parameters are totally bogus they work pretty good.
Now, we’re all ready to start estimating some poses. So let’s use solvePnP:

vector<Point2f > imagePoints;
//read 2D points from file...
FILE* f;
fopen_s(&f,"points.txt","r");
for(int i=0;i<7;i++) {
     int x,y;
     fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);</td>
//make a Mat of the vector<>
Mat ip(imagePoints);
//display points on image
Mat img = imread("image.png");
for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);
//"distortion coefficients"... hah!
double _dc[] = {0,0,0,0};
//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);
//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr<double>();
printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n",
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);

Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();
//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera
//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();
glPopMatrix();
glPopMatrix();
//----------End axes --------------

That wasn’t too hard, huh? Awesome.

So…. Results

Code

You can grab the code from the SVN repo:

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose

Enjoy!
Roy.

30 replies on “Quick and Easy Head Pose Estimation with OpenCV [w/ code]”

Yes, the points are marked by hand.
You could use the OpenCV bundled V&J detector to get facial features, but it will produce very bad results.
This project you referenced is indeed very similar…
R.

This looks cool. It would be even more interesting if the feature points that are manually marked on the face could be detecte automatically.
Does OpenCV have any support to automatically detect those features, say using templates or something?
Best
Srimal

OpenCV has a very good implementation of Viola-Jones detector (see here), that potentially can detect the eyes, tip of nose, ends of mouth and ears.
But OpenCV does not shipped with built-in detectors for ears or noses (although the center of the face is usually the nose..), you must create those on your own. From my experience I can say this is a very hard task, that requires a huge database of positive examples… for a good detector that is.
Roy

Hi All,
I developed a real time face tracker with pose estimation (totally automatic, no feature marking is requested). Please take a look at my site and gimme feeds if you want 😉 I also uploaded some clips on youtube to show the features of my system.
site: sites.google.com/site/ferrariimageprocessing
Bye,
Alessandro.

Hi Alessandro!
Of course I saw your videos on YouTube, they have been a great inspiration for me!
Thanks for checking out my blog and work, and I’ll try to comment on your work as well if my time allows..
Roy.

Hi ,
I want to know what compiler that you use to run this code.
I got many problem to compile it.
Sorry for my English language. I’m not Native speaker.

I worked with Microsoft Visual Studio (C++) 2008 pro.
There are .vcproj and .sln files for opening it in the IDE.
Roy.

Hi,
I have a good application that we can develop using your code.
Maybe we can work together.
Please contact me at my private mail ( [email protected] )
Best Regards,
Franco

Thanks for sharing your hard work!
I have tried to run the code that you have provided in the link
with the same Angelina Jolie data set you have used in the youtube demo,
but for some reason I get different result (it seems very wrong for some images)
Do you happen to update anything for the youtube demo?
It would be great if you could share that version too!
regards,
Andrew

Hi Roy,
maybe this is a little bit off-topic, but maybe you like to share your opinion about following problem:
I like to capture my dartboard via (web-)cam and estimate the postion of thrown darts.
I read a lot about object detection/tracking, face detection, HAAR-like features etc. But I’m not sure if this isn’t overkill at all. Let’s have a look on the situation:
1. The background is well-known, maybe different lightning situations
2. Darts could be known if necessary.
3. Pose of the darts should be estimated
I’m happy about all hints I can get how to cut this problem down into peaces.
Best regards,
Sven

@Sven:
sounds like basic background subtraction could do a good job, since the cam is stationary and the scene doesn’t change.
it’ll give you the region where there is a change (a dart)
on top of that you can start building some smarter code.

If I can mark and pass the 3d points of different identity to the solvePnP, then the pose estimation error will be less than using a uniform 3d points. Right?

May i know what kind of software that u used in order to compile the source code? how about vs2008 and opencv2.2? it is ok?

Thanks for the code, but how to get translation vector for using as normal vector3df for example?

Hi,
Could you please explain me more in details about your model points?
Are there a 3D model points for the frontal face?
Thanks.
Hoanam

Same as andrew. I get some results that are waaay off. Some of them seem just right, but that might be my imagination at work. Do you have a newer code?

Hi Roy
Does SolvePnP work for non-planar objects too? there is considerable discussion about it in OpenCV forums but none are kind of conclusive…maybe the features you considered more or less lied in a plane?
Daniel

@Daniel
I think SolvePnP is actually better finding the pose when the point are non-coplanar.
As I’ve said before, for this method to work better you should get many more features than only 6…

@gayatri
The points were detected manually, but I did write a quick eye locator based on V-J:
https://github.com/royshil/HeadReplacement/blob/master/VirtualSurgeon/VirtualSurgeon_Utils/VirtualSurgeon_Utils.cpp
Look at the VirtualSurgeonFaceData::DetectEyes function
But there are far better face features extractors… depending on your situation.
AAM/ASM are good with video stream but less with single images, with single images a feature-based detector is often.

i have tried to run the code with the same Angelina Jolie data set you have used in the youtube demo,but I got different result which seems very wrong for some images.And the order of the feature point seems not identical with the order you listed here.could you give me some tips and thanks very much.

Hi, which version of OpenCV did you use for this implementation? 2.0? It seems the code didn’t work on my pc and the error message is “Size of position vector must be 4×1!” for the function decomposeProjectionMatrix. Do you happen to know the reason? Thanks so much!

Hi, roy, your job is great!! I have two questions need your helps.
1. After using your codes, I find my result seems to be wrong which is different with that shown in your demo video. I also test other Angelina’s image. Their results are different with yours, too.
2. opencv2.0 you suggested cannot work well. The program always stops at “decomposeProjectionMatrix( tmpmtx,tmp,tmp1,tmp2,tmp3,tmp4,tmp5,eav);”. Instead, I compiled your codes using opencv2.1. I wonder if my change makes my result incorrect?
I’ll appreciate if you could answer my questions.

Leave a Reply

Your email address will not be published. Required fields are marked *