Quick and Easy Head Pose Estimation with OpenCV [w/ code]

Update: check out my new post about this http://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/

Just wanted to share a small thing I did with OpenCV - Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.

I implemented a very quick & dirty solution based on OpenCV's internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method - solvePnP - that does this, given some calibration parameters that I completely disregarded.

Here's how it's done


I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It's supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.

So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:

  1. Left ear
  2. Right ear
  3. Left eye
  4. Right eye
  5. Nose tip
  6. Left mouth corner
  7. Right mouth corner

Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina's pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.

Time to Code

First I initialize the 3D points vector, and a dummy camera matrix:

vector<Point3f > modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear

op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
         0,0,-1}; //rotation: looking at -z axis
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //"calibration matrix": center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);

Even though the "calibration" parameters are totally bogus they work pretty good.

Now, we're all ready to start estimating some poses. So let's use solvePnP:

vector<Point2f > imagePoints;

//read 2D points from file...
FILE* f;
for(int i=0;i<7;i++) {
     int x,y;
     fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y);

//make a Mat of the vector<>
Mat ip(imagePoints);

//display points on image
Mat img = imread("image.png");
for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);

//"distortion coefficients"... hah!
double _dc[] = {0,0,0,0};

//here's where the magic happens

//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
double* _r = rotM.ptr<double>();
printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n",

Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:

gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glTranslated(0,0,5); //go a bit back to where I want to draw the axes

//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                0,         0,     0             ,1};
glRotated(180,1,0,0); //rotate around to face the camera

//----------- Draw Axes --------------
//Z = red

//Y = green

//X = blue

//----------End axes --------------

That wasn't too hard, huh? Awesome.

So.... Results


You can grab the code from the SVN repo:

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose



  • Cfr

    I suppose you have put all points by hand?
    BTW, you might be interested in Ehci http://code.google.com/p/ehci/wiki/6dofhead

  • Roy

    Yes, the points are marked by hand.
    You could use the OpenCV bundled V&J detector to get facial features, but it will produce very bad results.

    This project you referenced is indeed very similar...


  • This looks cool. It would be even more interesting if the feature points that are manually marked on the face could be detecte automatically.

    Does OpenCV have any support to automatically detect those features, say using templates or something?



  • Roy

    OpenCV has a very good implementation of Viola-Jones detector (see here), that potentially can detect the eyes, tip of nose, ends of mouth and ears.
    But OpenCV does not shipped with built-in detectors for ears or noses (although the center of the face is usually the nose..), you must create those on your own. From my experience I can say this is a very hard task, that requires a huge database of positive examples... for a good detector that is.


  • Hi All,
    I developed a real time face tracker with pose estimation (totally automatic, no feature marking is requested). Please take a look at my site and gimme feeds if you want 😉 I also uploaded some clips on youtube to show the features of my system.

    site: sites.google.com/site/ferrariimageprocessing


  • Roy

    Hi Alessandro!
    Of course I saw your videos on YouTube, they have been a great inspiration for me!
    Thanks for checking out my blog and work, and I'll try to comment on your work as well if my time allows..

  • change

    Hi ,
    I want to know what compiler that you use to run this code.
    I got many problem to compile it.

    Sorry for my English language. I'm not Native speaker.

  • Roy

    I worked with Microsoft Visual Studio (C++) 2008 pro.
    There are .vcproj and .sln files for opening it in the IDE.


  • Franco

    I have a good application that we can develop using your code.
    Maybe we can work together.
    Please contact me at my private mail ( franco.amato@gmail.com )

    Best Regards,

  • Andrew

    Thanks for sharing your hard work!
    I have tried to run the code that you have provided in the link
    with the same Angelina Jolie data set you have used in the youtube demo,
    but for some reason I get different result (it seems very wrong for some images)

    Do you happen to update anything for the youtube demo?
    It would be great if you could share that version too!



  • Sven

    Hi Roy,
    maybe this is a little bit off-topic, but maybe you like to share your opinion about following problem:
    I like to capture my dartboard via (web-)cam and estimate the postion of thrown darts.
    I read a lot about object detection/tracking, face detection, HAAR-like features etc. But I'm not sure if this isn't overkill at all. Let's have a look on the situation:

    1. The background is well-known, maybe different lightning situations
    2. Darts could be known if necessary.
    3. Pose of the darts should be estimated

    I'm happy about all hints I can get how to cut this problem down into peaces.

    Best regards,

  • Roy

    sounds like basic background subtraction could do a good job, since the cam is stationary and the scene doesn't change.
    it'll give you the region where there is a change (a dart)
    on top of that you can start building some smarter code.

  • Kelp

    If I can mark and pass the 3d points of different identity to the solvePnP, then the pose estimation error will be less than using a uniform 3d points. Right?

  • Black

    May i know what kind of software that u used in order to compile the source code? how about vs2008 and opencv2.2? it is ok?

  • Alvaro

    Thanks for the code, but how to get translation vector for using as normal vector3df for example?

  • Hoanam

    Could you please explain me more in details about your model points?
    Are there a 3D model points for the frontal face?


  • Victor

    Same as andrew. I get some results that are waaay off. Some of them seem just right, but that might be my imagination at work. Do you have a newer code?

  • Roy

    I am referring you to FaceTracker:
    By Jason Saragih
    It is very robust and has a clean API

  • Daniel

    Hi Roy

    Does SolvePnP work for non-planar objects too? there is considerable discussion about it in OpenCV forums but none are kind of conclusive...maybe the features you considered more or less lied in a plane?


  • Roy

    I think SolvePnP is actually better finding the pose when the point are non-coplanar.
    As I've said before, for this method to work better you should get many more features than only 6...

  • gayatri

    hai,please mail me the matlab code of detecting these points if u have.thank you.

  • Roy

    The points were detected manually, but I did write a quick eye locator based on V-J:
    Look at the VirtualSurgeonFaceData::DetectEyes function
    But there are far better face features extractors... depending on your situation.
    AAM/ASM are good with video stream but less with single images, with single images a feature-based detector is often.

  • Chris

    i have tried to run the code with the same Angelina Jolie data set you have used in the youtube demo,but I got different result which seems very wrong for some images.And the order of the feature point seems not identical with the order you listed here.could you give me some tips and thanks very much.

  • Roy

    As you might have seen in prior comments, this method is very weak and probably needs plenty more feature points to get close to accurately estimating pose...
    As I suggested before, you may try to use Jason Saragih's FaceTracker: http://web.mac.com/jsaragih/FaceTracker/FaceTracker.html

  • Ray

    Hi, which version of OpenCV did you use for this implementation? 2.0? It seems the code didn't work on my pc and the error message is "Size of position vector must be 4x1!" for the function decomposeProjectionMatrix. Do you happen to know the reason? Thanks so much!

  • David

    Hi, roy, your job is great!! I have two questions need your helps.

    1. After using your codes, I find my result seems to be wrong which is different with that shown in your demo video. I also test other Angelina's image. Their results are different with yours, too.
    2. opencv2.0 you suggested cannot work well. The program always stops at "decomposeProjectionMatrix( tmpmtx,tmp,tmp1,tmp2,tmp3,tmp4,tmp5,eav);". Instead, I compiled your codes using opencv2.1. I wonder if my change makes my result incorrect?

    I'll appreciate if you could answer my questions.

  • percepticat

    Hi Roy,

    I found this, and thought you may be able to help me 🙂
    I am tryin to use solvePnp with 4 points on a plane.
    The problem I'm having is that I encounter frames with nearly no change in pixel positions (under half a pixel), yet a dramatic change in the reconstruct 3D positions.

    There are more details (code and input/output) here: http://answers.opencv.org/question/12547/inconsistent-results-from-solvepnp/

    I'd appreciate any help!!!

  • Pingback: head pose estimation | Kyu's Blog()

  • Pingback: How to draw 3D Coordinate Axes with OpenCV for face pose estimation? | DL-UAT()

  • gsimons

    Is this code still live somewhere? The given SVN link is dead...