Augmented Reality with NyARToolkit, OpenCV & OpenGL

arHi

I have been playing around with NyARToolkit's CPP implementation in the last week, and I got some nice results. I tried to keep it as "casual" as I could and not get into the crevices of every library, instead, I wanted to get results and fast.

First, NyARToolkit is a derivative of the wonderful ARToolkit by the talented people @ HIT Lab NZ & HIT Lab Uni of Washington. NyARToolkit however was ported to many other different platforms, like Java, C# and even Flash (Papervision3D?), and in the process making it object oriented, instead of ARToolkit procedural approach. NyARToolkit have made a great job, so I decided to build from there.

NyART don't provide any video capturing, and no 3D rendering in their CPP implementation (they do in the other ports), so I set out to build it on my own. OpenCV is like a second language to me, so I decided to take its video grabbing mechanism wrapper for Win32. For 3D rendering I used the straightforward GLUT library which does an excellent job ridding the programmer from all the Win#@$#@ API mumbo-jumbo-CreateWindowEx crap.

So let's dive in....

Couple of steps needed to be done:

  1. [Preprocessing - calibrating camera...]
  2. Initialize stuff & grab video.
  3. Use NyART to detect marker and get exrinsic camera parameters.
  4. "Calibrate" 3D world to fit camera.
  5. Draw 3D scene on-top of video input.

Initialization: easy...

I used cvCapture & cvGrabFrame, this is extremely easy. However, the frames in Win environment are always grabbed up side down. So we need to take care of image->origin.

capture = cvCaptureFromCAM('0');
cvNamedWindow("input");

	frame = cvQueryFrame( capture );
	if( frame ) {
		if(ra == NULL) {
			ra = new NyARRgbRaster_BGRA(frame->width, frame->height);
			ap.changeScreenSize(frame->width, frame->height);
			ar = new NyARSingleDetectMarker(ap, code, 80.0);
			code=NULL;
			ar->setContinueMode(false);

			arglCameraFrustumRH(ap,1.0,100.0,camera_proj);
		}

		if( !image )
		{
			CvSize s = cvGetSize(frame);
			image = cvCreateImage(s , 8, 3 );
			image->origin = frame->origin;
			bgra_img = cvCreateImage(s,8,4);
			bgra_img->origin = frame->origin;
			gray = cvCreateImage(s,8,1);
			gray->origin = frame->origin;
			flipped = cvCreateImage(s,8,3);
			ra->setBuffer((NyARToolkitCPP::NyAR_BYTE_t*)bgra_img->imageData);
		}
		win_w = frame->width;
		win_h = frame->height;
	}

I use 4 IplImages:

  1. frame buffer (3 channels),
  2. flipped (3 channels), which is the properly aligned frame to be used as the background for the OpenGL scene
  3. grayscale (1 channel), which provides beter results for marker detecting,
  4. And - bgra image (4 channels), to align with NyART's BGRA raster

Note that I can only initialize NyARRgbRaster_BGRA & NyARSingleDetectMarker after grabbing the first frame, as I need the width & height of the video stream. You can bypass this by querying the CvCapture object directly.

Also, note the shared buffer between the RGBA raster and bgra_image. I say that shared buffers are very helpful, sparing all the useless bit shuffling... you should try and use them if you can.

Another thing to notice is the loading of the camera matrix by using ARToolkit's arglCameraFrustumRH function. NyART also has this function, but I used the more native version by ARTk as it uses float[] and not float*, which is easier to handle and debug.

Finally, some OpenGL initialization code:

	mesh = loadFromFile("c:/downloads/apple.obj");

	glShadeModel(GL_FLAT);
	glClearColor(0.0f, 0.0f, 0.0f, 0.5f);
	glClearDepth(1.0f);
	glDepthFunc(GL_LEQUAL);
	glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);

This came flying out of NeHe's number 2 lesson for GLUT. Except for loadFromFile, which is a small framework I wrote to load .obj files - I'll speak of it later as well.

OK, done with initialization, on to...

Detecting the marker:

Well, NyARTk are taking care of all that:

	bool doMult = false;

        frame = cvQueryFrame( capture );
		if( frame ) {
			cvCopy( frame, image);
			cvCvtColor(image,gray,CV_RGB2GRAY);
			cvCvtColor(gray,bgra_img,CV_GRAY2BGRA);	
			cvCvtColor(frame,flipped,CV_RGB2BGR);
			cvFlip(flipped);

			cvShowImage("input",gray);

			if(ar->detectMarkerLite(*ra, 100)) {
				ar->getTransmationMatrix(result_mat);
			//	printf("Marker confidence\n cf=%f,direction=%d\n",ar->getConfidence(),ar->getDirection());
				//printf("Transform Matrix\n");
				//printf(
				//	"% 4.8f,% 4.8f,% 4.8f,% 4.8f\n"
				//	"% 4.8f,% 4.8f,% 4.8f,% 4.8f\n"
				//	"% 4.8f,% 4.8f,% 4.8f,% 4.8f\n",
				//	result_mat.m00,result_mat.m01,result_mat.m02,result_mat.m03,
				//	result_mat.m10,result_mat.m11,result_mat.m12,result_mat.m13,
				//	result_mat.m20,result_mat.m21,result_mat.m22,result_mat.m23);
				doMult = true;
			}
		}

This is pretty straightforward... I grab the frame, convert it to grayscale and then to BGRA (and via the shared buffer into the NyART BGRA raster), and use detectMarkerLite on the raster.

As you remember, the marker detection is giving us the extrinsic camera properties, which means location in the world and rotation. Now this has to be input into OpenGL scene rendering to match the virtual camera with the real life camera.

Draw the feed from the camera as the background

But first, we need to draw the video frame as the background for the 3D scene:

	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	glLoadIdentity();
	gluOrtho2D(0.0,win_w, 0.0,win_h);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glLoadIdentity();

	glDrawPixels(win_w,win_h,GL_RGB,GL_UNSIGNED_BYTE,flipped->imageData);

	glMatrixMode(GL_PROJECTION);
	glPopMatrix();
	glMatrixMode(GL_MODELVIEW);
	glPopMatrix();

	glEnable(GL_DEPTH_TEST);
    glDepthFunc(GL_LEQUAL);

This is done by going to orthographic mode using gluOrtho2D, and drawing the pixels using glDrawPixels with flipped's buffer.

Set the OpenGL virtual camera to align with the real life camera

Next, we need to alter the camera to match the scene:

	if(doMult) {
		glMatrixMode(GL_PROJECTION);
		glLoadMatrixd(camera_proj);
		glMatrixMode(GL_MODELVIEW);
		glLoadIdentity();
		float m[16] = {0.0f}; m[0] = 1.0f; m[5] = 1.0f; m[10] = 1.0f; m[15] = 1.0f;
		toCameraViewRH(result_mat,m);
		glLoadMatrixf(m);
	} else {
		glMatrixMode(GL_MODELVIEW);
		glLoadIdentity();
	}

The result from NyARTk is in the global result_mat, and the function toCameraViewRH only takes it and puts in the float[] m matrix.
Note that before that I load the camera projection matrix, this can probably be done in init() just the same as this matrix doesn't change.

Some 3D rendering...

Then I do some OpenGL drawing...

	glPushMatrix();
	glScalef(.02f,.02f,.02f);
	glRotatef(xrot,0.0f,0.0f,1.0f);
	xrot+=5.0f;
	glRotated(90,1,0,0);

	glutSolidCube(50.0);

	glPopMatrix();

A simple rotating GLU cube...

It's about time for a video

Framework to load .obj files

The .obj format is a very simple, text-based format of saving 3D models. You can see the spec here. The implementation is included in the code, in objloader.cpp.
I should be completely honest, I ripped a small portion of the code from a forum online, and I can't remember or make out where I took it from. But since I took it I completely refaced the code, and added pretty much everything you see there.

The code

Check out the source from google code: http://code.google.com/p/morethantechnical/source/checkout
Under trunk/NyARToolkit-CPP

Enjoy!
Roy.

Share