Update: check out my new post about this https://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/
Hi
Just wanted to share a small thing I did with OpenCV – Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.
I implemented a very quick & dirty solution based on OpenCV’s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method – solvePnP – that does this, given some calibration parameters that I completely disregarded.
Here’s how it’s done
Intro
I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It’s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.
So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used MeshLab to mark some points on the model:
- Left ear
- Right ear
- Left eye
- Right eye
- Nose tip
- Left mouth corner
- Right mouth corner
Then I headed to LFW database to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina’s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.
Time to Code
First I initialize the 3D points vector, and a dummy camera matrix:
vector<Point3f > modelPoints; modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f)); //l eye modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f)); //r eye modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f)); //l mouth modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f)); //r mouth modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f)); //l ear modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f)); //r ear op = Mat(modelPoints); op = op / 35; //just a little normalization... rvec = Mat(rv); double _d[9] = {1,0,0, 0,-1,0, 0,0,-1}; //rotation: looking at -z axis Rodrigues(Mat(3,3,CV_64FC1,_d),rvec); tv[0]=0;tv[1]=0;tv[2]=1; tvec = Mat(tv); double _cm[9] = { 20, 0, 160, 0, 20, 120, 0, 0, 1 }; //"calibration matrix": center point at center of picture with 20 focal length. camMatrix = Mat(3,3,CV_64FC1,_cm);
Even though the “calibration” parameters are totally bogus they work pretty good.
Now, we’re all ready to start estimating some poses. So let’s use solvePnP:
vector<Point2f > imagePoints; //read 2D points from file... FILE* f; fopen_s(&f,"points.txt","r"); for(int i=0;i<7;i++) { int x,y; fscanf_s(f,"%d",&x); fscanf_s(f,"%d",&y); imagePoints.push_back(Point2f((float)x,(float)y)); } fclose(f);</td> //make a Mat of the vector<> Mat ip(imagePoints); //display points on image Mat img = imread("image.png"); for(unsigned int i=0;i<imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED); //"distortion coefficients"... hah! double _dc[] = {0,0,0,0}; //here's where the magic happens solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true); //decompose the response to something OpenGL would understand. //translation vector is irrelevant, only rotation vector is important Mat rotM(3,3,CV_64FC1,rot); Rodrigues(rvec,rotM); double* _r = rotM.ptr<double>(); printf("rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n", _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:
glMatrixMode(GL_MODELVIEW); glLoadIdentity(); gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis glPushMatrix(); glTranslated(0,0,5); //go a bit back to where I want to draw the axes glPushMatrix(); //this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face double _d[16] = { rot[0],rot[1],rot[2],0, rot[3],rot[4],rot[5],0, rot[6],rot[7],rot[8],0, 0, 0, 0 ,1}; glMultMatrixd(_d); glRotated(180,1,0,0); //rotate around to face the camera //----------- Draw Axes -------------- //Z = red glPushMatrix(); glRotated(180,0,1,0); glColor3d(1,0,0); glutSolidCylinder(0.05,1,15,20); glTranslated(0,0,1); glScaled(.1,.1,.1); glutSolidTetrahedron(); glPopMatrix(); //Y = green glPushMatrix(); glRotated(-90,1,0,0); glColor3d(0,1,0); glutSolidCylinder(0.05,1,15,20); glTranslated(0,0,1); glScaled(.1,.1,.1); glutSolidTetrahedron(); glPopMatrix(); //X = blue glPushMatrix(); glRotated(-90,0,1,0); glColor3d(0,0,1); glutSolidCylinder(0.05,1,15,20); glTranslated(0,0,1); glScaled(.1,.1,.1); glutSolidTetrahedron(); glPopMatrix(); glPopMatrix(); glPopMatrix(); //----------End axes --------------
That wasn’t too hard, huh? Awesome.
So…. Results
Code
You can grab the code from the SVN repo:
svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose
Enjoy!
Roy.