3d code graphics opencv opengl programming video vision

20-lines AR in OpenCV [w/code]

Post author By Roy
Post date November 10, 2010
13 Comments on 20-lines AR in OpenCV [w/code]

Hi,
Just wanted to share a bit of code using OpenCV’s camera extrinsic parameters recovery, camera position and rotation – solvePnP (or it’s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code.
This can serve as a primer, or tutorial on how to use OpenCV with OpenGL for AR.
Update 2/16/2015: I wrote another post on OpenCV-OpenGL AR, this time using the fine QGLViewer – a very convenient Qt OpenGL widget.
The program is just a straightforward optical flow based tracking, fed manually with four points which are the planar object’s corners, and solving camera-pose every frame. Plain vanilla AR.
Well the whole cpp file is ~350 lines, but there will only be 20 or less interesting lines… Actually much less. Let’s see what’s up

I wanna run you through the code really quickly and not go into much detail, to keep thing simple. So first of all, we should have two separate threads: Vision and Graphics. The vision thread will track and solve, and the graphics thread will display.

Initialize

int main(int argc, char** argv) {
	initGL(argc,argv);
	initOCV(NULL);
	pthread_t tId;
	pthread_attr_t tAttr;
	pthread_attr_init(&tAttr);
	pthread_create(&tId, &tAttr, startOCV, NULL);
	startGL(NULL);
}

The initGL, initOCV functions just initialize stuff that can’t be initialized statically, like GLUT window definitions, some starting values for the cam-pose estimation and other boring stuff.
GLUT will run off the main thread, it seems putting it on its own thread makes it unhappy and not work.

Tracking

I’m using the simplest form of optical flow in OpenCV (LK Pyramid), and the code is equally very minimal..

void* startOCV(void* arg) {
	while (1) {
		cvtColor(img, prev, CV_BGR2GRAY);
		//get frame off camera
		cap >> frame;
		if(frame.data == NULL) break;
		frame.copyTo(img);
		cvtColor(img, next, CV_BGR2GRAY);
		//calc optical flow
		calcOpticalFlowPyrLK(prev, next, points1, points2, status, err, Size(30,30));
		cvtPtoKpts(imgPointsOnPlane, points2);
		//switch points vectors (next becomes previous)
		points1.clear();
		points1 = points2;
		//calculate camera pose
		getPlanarSurface(points1);
		//refresh 3D scene
		glutPostWindowRedisplay(glutwin);
		//show tracked points on scene
		drawKeypoints(next, imgPointsOnPlane, img_to_show, Scalar(255));
		imshow("main2", img_to_show);
		int c = waitKey(30);
		if (c == ' ') {
			waitKey(0);
		}
	}
	return NULL;
}

To use OpenCV’s ‘drawKeypoints’, which makes drawing key points much easier, we must use vector<KeyPoint>. So I created these 2 very simple converter funcs: cvtKeyPtoP and cvtPtoKpts.
You think ‘getPlanarSurface’ is complicated? think again! 3 lines:

void getPlanarSurface(vector<Point2f>& imgP) {
	Rodrigues(rotM,rvec);
	solvePnP(objPM, Mat(imgP), camera_matrix, distortion_coefficients, rvec, tvec, true);
	Rodrigues(rvec,rotM);
}

Booya! Vision stuff is done.

3D Graphics

A little 3D never hurt any AR system… But drawing it is very simple still:

void display(void)
{
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	//Make sure we have a background image buffer
	if(img_to_show.data != NULL) {
		Mat tmp;
		//Switch to Ortho for drawing background
		glMatrixMode(GL_PROJECTION);
		glPushMatrix();
		gluOrtho2D(0.0, 0.0, 640.0, 480.0);
		glMatrixMode(GL_MODELVIEW);
		//Textures can only have power-of-two dimensions, so closest to 640x480 is 1024x512
		tmp = Mat(Size(1024,512),CV_8UC3);
		//However we are going to use only a portion, so create an ROI
		Mat ttmp = tmp(Range(0,img_to_show.rows),Range(0,img_to_show.cols));
		//Some frames could be 8bit grayscale, so make sure on the output we always get 24bit RGB.
		if(img_to_show.step == img_to_show.cols)
			cvtColor(img_to_show, ttmp, CV_GRAY2RGB);
		else if(img_to_show.step == img_to_show.cols * 3)
			cvtColor(img_to_show, ttmp, CV_BGR2RGB);
		flip(ttmp,ttmp,0);
		glEnable(GL_TEXTURE_2D);
		glTexImage2D(GL_TEXTURE_2D, 0, 3, 1024, 512, 0, GL_RGB, GL_UNSIGNED_BYTE, tmp.data);
		//Finally, draw the texture using a simple quad with texture coords in corners.
		glPushMatrix();
		glTranslated(-320.0, -240.0, -500.0);//why these parameters?!
		glBegin(GL_QUADS);
		glTexCoord2i(0, 0); glVertex2i(0, 0);
		glTexCoord2i(1, 0); glVertex2i(640, 0);
		glTexCoord2i(1, 1); glVertex2i(640, 480);
		glTexCoord2i(0, 1); glVertex2i(0, 480);
		glEnd();
		glPopMatrix();
		glMatrixMode(GL_PROJECTION);
		glPopMatrix();
		glMatrixMode(GL_MODELVIEW);
	}
	glPushMatrix();
	double m[16] = {	_d[0],-_d[3],-_d[6],0,
						_d[1],-_d[4],-_d[7],0,
						_d[2],-_d[5],-_d[8],0,
						tv[0],-tv[1],-tv[2],1	};
	//Rotate and translate according to result from solvePnP
	glLoadMatrixd(m);
	//Draw a basic cube
	glDisable(GL_TEXTURE_2D);
	glColor3b(255, 0, 0);
	glutSolidCube(1);
	glPopMatrix();
	glutSwapBuffers();
}

Not so horrific, huh? Most of it is drawing the background texture, and that’s only trying to avoid using glDrawPixels… The only interesting thing is loading the rotation and translation matrix.
However you will notice the tv[0] (x axis component of translation) doesn’t have a minus sign, that’s because OpenCV’s solvePnP assumes looking down the -z axis, while OpenGL assumes looking up the +z axis (so a 180 rotation around the x axis is needed). Same goes for _d[0] _d[1] and _d[2].
OpenGL in fact is defaulting to the camera looking down the -y axis, where the z axis is facing up (z is elevation). But in initGL I initialized OpenGL to look “normally” down the -z axis where +x goes right and +y goes up.

Proof time

Not that you need it.. 🙂 But here’s a video of it working.

BTW: If anyone can solve the problem of the slight misalignment of the 3D and image – let me know.

Code and Salutations

Code can be downloaded from blog’s SVN:

svn checkout http://morethantechnical.googlecode.com/svn/trunk/OpenCVAR morethantechnical-OpenCVAR

Now let your imagination run wild!
Farewell,
Roy.

Tags augmented reality, code, computer vision, opencv, opengl

13 replies on “20-lines AR in OpenCV [w/code]”

Hi,
I strongly suspect you get the slight misalignment because you do not calibrate the camera for lens distortion (e.g. intrinsic parameters). You’d have to use a rectified image to not get misalignment. Monocular camera calibration is the keyword here.
regards,
Stefan

That doesn’t look like a problem with camera calibration, I think its more likely to be from having a wrong rotation axis. Note that there are only 2 possible rotations for a 2D object (+ or -), but 24 possible rotations for a 3D object, so much harder to debug! But still a nice demo!

Hello, thanks for this tutorial !
Did someone fix the alignement bug ? And why the cube is moving, it should be fixed in the center of the chess !! ??
regards
Red

Very usefull tutorial
I have a question:
Can you use sonething else for tracking like an image or a map with this code?

To me it just looks like straight forward misalignment of the object only a small amount of misplacement of the object will cause this to happen from what i have seen in other examples, just play with tweaking a few x, y and z values of position, or the objects centre of rotation (is it in the base of the cube or middle? – should be middle of base right??) etc and it should snap in

Interested to speak with anyone here who is actively coding AR systems. We have a startup looking for a specific-use solution that COMBINES location-based and image-based marker tracking. Significant compensation possible. Feel free to contact me: [email protected]

Will this work with multiple markers? Else how to implement that?
thanks

I am new to the openGL and I want to combined 3d object into openCV2.3 in vc2010 C language.
I think your code can be a good sample for me to learn.
I am not good at coding. I compile it and it say
‘pthread.h’: No such file or directory
Where can I find it or what should I do
thanks

SVN Checkout link is not working. Please fix that
Regards,

Hello. I’m also working at some AR works.
I have had a problem similar to yours.
Have you tried to use perspective projection to instead the orthogonal projection.
And your should calibrate the field of view angle of the camera as well, if you are using perspective projection.
If it doesn’t work, you can do some experiment to obtain the error of the transformation matrix.
Hope these would help.

gluOrtho2D(0.0, 0.0, 640.0, 480.0);？？？？

Hi am looking through the code, do you have the mov4.mp4 used
line 323: cap.open(“../../mov4.mp4”);

[…] 自定义模板:原理和固定模板AR大致相同，差就差在模板的检测上，主要是通过检测特征点，然后根据不同帧间相应特征点的位置变化求得[R|t]矩阵，BazAR便是这种类型的，其他的像下面这个链接，用很少的代码实现了这种类型AR的demo，https://www.morethantechnical.com … ar-in-opencv-wcode/ ，这个代码里用的是光流法进行特征点的跟踪，下面这个链接的代码，用的是surf特征，http://morethantechnical.googlecode.com/svn/trunk/opencv_ar/ ，不管怎么样除了特征不同，坐标系变换的原理都和上一篇文章类似，在OpenCV中提供了一些函数可以轻松实现坐标系变换，比如solvePnP和cvFindHomography。 […]

Comments are closed.