Mar 19 2010

Quick and Easy Head Pose Estimation with OpenCV [w/ code]


Update: check out my new post about this http://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/
Hi

Just wanted to share a small thing I did with OpenCV - Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.

I implemented a very quick & dirty solution based on OpenCV's internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method - solvePnP - that does this, given some calibration parameters that I completely disregarded.

Here's how it's done

Intro

I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It's supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.

So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:

  1. Left ear
  2. Right ear
  3. Left eye
  4. Right eye
  5. Nose tip
  6. Left mouth corner
  7. Right mouth corner

Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina's pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.

Time to Code

First I initialize the 3D points vector, and a dummy camera matrix:

vector<Point3f > modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear

op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //"calibration matrix": center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);

Even though the "calibration" parameters are totally bogus they work pretty good.

Now, we're all ready to start estimating some poses. So let's use solvePnP:

vector<Point2f > imagePoints;

//read 2D points from file...
FILE* f;
fopen_s(&f,"points.txt","r");
for(int i=0;i<7;i++) {
     int x,y;
     fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);</td>

//make a Mat of the vector<>
Mat ip(imagePoints);

//display points on image
Mat img = imread("image.png");
for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);

//"distortion coefficients"... hah!
double _dc[] = {0,0,0,0};

//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);

//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr<double>();
printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n",
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);

Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();

//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera

//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

glPopMatrix();
glPopMatrix();
//----------End axes --------------

That wasn't too hard, huh? Awesome.

So.... Results

Code

You can grab the code from the SVN repo:


svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose

Enjoy!

Roy.

Share

27 responses so far

27 Responses to “Quick and Easy Head Pose Estimation with OpenCV [w/ code]”

  1. Cfron 20 Mar 2010 at 1:53 am

    I suppose you have put all points by hand?
    BTW, you might be interested in Ehci http://code.google.com/p/ehci/wiki/6dofhead

  2. Royon 21 Mar 2010 at 6:40 pm

    Yes, the points are marked by hand.
    You could use the OpenCV bundled V&J detector to get facial features, but it will produce very bad results.

    This project you referenced is indeed very similar...

    R.

  3. Srimalon 09 Apr 2010 at 7:40 am

    This looks cool. It would be even more interesting if the feature points that are manually marked on the face could be detecte automatically.

    Does OpenCV have any support to automatically detect those features, say using templates or something?

    Best

    Srimal

  4. Royon 10 Apr 2010 at 3:03 pm

    OpenCV has a very good implementation of Viola-Jones detector (see here), that potentially can detect the eyes, tip of nose, ends of mouth and ears.
    But OpenCV does not shipped with built-in detectors for ears or noses (although the center of the face is usually the nose..), you must create those on your own. From my experience I can say this is a very hard task, that requires a huge database of positive examples... for a good detector that is.

    Roy

  5. Alessandro Ferrarion 20 Apr 2010 at 4:29 pm

    Hi All,
    I developed a real time face tracker with pose estimation (totally automatic, no feature marking is requested). Please take a look at my site and gimme feeds if you want ;) I also uploaded some clips on youtube to show the features of my system.

    site: sites.google.com/site/ferrariimageprocessing

    Bye,
    Alessandro.

  6. Royon 21 Apr 2010 at 2:05 pm

    Hi Alessandro!
    Of course I saw your videos on YouTube, they have been a great inspiration for me!
    Thanks for checking out my blog and work, and I'll try to comment on your work as well if my time allows..
    Roy.

  7. changeon 18 May 2010 at 10:40 am

    Hi ,
    I want to know what compiler that you use to run this code.
    I got many problem to compile it.

    Sorry for my English language. I'm not Native speaker.

  8. Royon 21 May 2010 at 9:21 am

    I worked with Microsoft Visual Studio (C++) 2008 pro.
    There are .vcproj and .sln files for opening it in the IDE.

    Roy.

  9. Francoon 17 Jun 2010 at 11:52 pm

    Hi,
    I have a good application that we can develop using your code.
    Maybe we can work together.
    Please contact me at my private mail ( franco.amato@gmail.com )

    Best Regards,
    Franco

  10. Andrewon 05 Oct 2010 at 12:55 pm

    Thanks for sharing your hard work!
    I have tried to run the code that you have provided in the link
    with the same Angelina Jolie data set you have used in the youtube demo,
    but for some reason I get different result (it seems very wrong for some images)

    Do you happen to update anything for the youtube demo?
    It would be great if you could share that version too!

    regards,

    Andrew

  11. Svenon 01 Feb 2011 at 11:14 pm

    Hi Roy,
    maybe this is a little bit off-topic, but maybe you like to share your opinion about following problem:
    I like to capture my dartboard via (web-)cam and estimate the postion of thrown darts.
    I read a lot about object detection/tracking, face detection, HAAR-like features etc. But I'm not sure if this isn't overkill at all. Let's have a look on the situation:

    1. The background is well-known, maybe different lightning situations
    2. Darts could be known if necessary.
    3. Pose of the darts should be estimated

    I'm happy about all hints I can get how to cut this problem down into peaces.

    Best regards,
    Sven

  12. Royon 02 Feb 2011 at 7:36 am

    @Sven:
    sounds like basic background subtraction could do a good job, since the cam is stationary and the scene doesn't change.
    it'll give you the region where there is a change (a dart)
    on top of that you can start building some smarter code.

  13. Kelpon 25 Apr 2011 at 10:32 am

    If I can mark and pass the 3d points of different identity to the solvePnP, then the pose estimation error will be less than using a uniform 3d points. Right?

  14. Blackon 29 Jul 2011 at 12:47 am

    May i know what kind of software that u used in order to compile the source code? how about vs2008 and opencv2.2? it is ok?

  15. Alvaroon 19 Oct 2011 at 6:58 pm

    Thanks for the code, but how to get translation vector for using as normal vector3df for example?

  16. Hoanamon 04 Jan 2012 at 1:37 pm

    Hi,
    Could you please explain me more in details about your model points?
    Are there a 3D model points for the frontal face?

    Thanks.
    Hoanam

  17. Victoron 07 Jan 2012 at 1:01 am

    Same as andrew. I get some results that are waaay off. Some of them seem just right, but that might be my imagination at work. Do you have a newer code?

  18. Royon 07 Jan 2012 at 1:14 am

    @Victor
    I am referring you to FaceTracker:
    http://web.mac.com/jsaragih/FaceTracker/FaceTracker.html
    By Jason Saragih
    It is very robust and has a clean API

  19. Danielon 21 Jan 2012 at 7:22 pm

    Hi Roy

    Does SolvePnP work for non-planar objects too? there is considerable discussion about it in OpenCV forums but none are kind of conclusive...maybe the features you considered more or less lied in a plane?

    Daniel

  20. Royon 24 Jan 2012 at 6:22 pm

    @Daniel
    I think SolvePnP is actually better finding the pose when the point are non-coplanar.
    As I've said before, for this method to work better you should get many more features than only 6...

  21. gayatrion 16 Feb 2012 at 12:21 pm

    hai,please mail me the matlab code of detecting these points if u have.thank you.

  22. Royon 16 Feb 2012 at 4:35 pm

    @gayatri
    The points were detected manually, but I did write a quick eye locator based on V-J:
    https://github.com/royshil/HeadReplacement/blob/master/VirtualSurgeon/VirtualSurgeon_Utils/VirtualSurgeon_Utils.cpp
    Look at the VirtualSurgeonFaceData::DetectEyes function
    But there are far better face features extractors... depending on your situation.
    AAM/ASM are good with video stream but less with single images, with single images a feature-based detector is often.

  23. Chrison 16 Mar 2012 at 3:57 am

    i have tried to run the code with the same Angelina Jolie data set you have used in the youtube demo,but I got different result which seems very wrong for some images.And the order of the feature point seems not identical with the order you listed here.could you give me some tips and thanks very much.

  24. Royon 16 Mar 2012 at 9:26 pm

    @Chris
    As you might have seen in prior comments, this method is very weak and probably needs plenty more feature points to get close to accurately estimating pose...
    As I suggested before, you may try to use Jason Saragih's FaceTracker: http://web.mac.com/jsaragih/FaceTracker/FaceTracker.html

  25. Rayon 15 May 2012 at 9:01 pm

    Hi, which version of OpenCV did you use for this implementation? 2.0? It seems the code didn't work on my pc and the error message is "Size of position vector must be 4x1!" for the function decomposeProjectionMatrix. Do you happen to know the reason? Thanks so much!

  26. Davidon 12 Oct 2012 at 10:11 pm

    Hi, roy, your job is great!! I have two questions need your helps.

    1. After using your codes, I find my result seems to be wrong which is different with that shown in your demo video. I also test other Angelina's image. Their results are different with yours, too.
    2. opencv2.0 you suggested cannot work well. The program always stops at "decomposeProjectionMatrix( tmpmtx,tmp,tmp1,tmp2,tmp3,tmp4,tmp5,eav);". Instead, I compiled your codes using opencv2.1. I wonder if my change makes my result incorrect?

    I'll appreciate if you could answer my questions.

  27. percepticaton 26 Apr 2013 at 8:40 pm

    Hi Roy,

    I found this, and thought you may be able to help me :)
    I am tryin to use solvePnp with 4 points on a plane.
    The problem I'm having is that I encounter frames with nearly no change in pixel positions (under half a pixel), yet a dramatic change in the reconstruct 3D positions.

    There are more details (code and input/output) here: http://answers.opencv.org/question/12547/inconsistent-results-from-solvepnp/

    I'd appreciate any help!!!

Trackback URI | Comments RSS

Leave a Reply