«

»

Aug 09

Near realtime face detection on the iPhone w/ OpenCV port [w/code,video]

iphone + opencv = winHi
OpenCV is by far my favorite CV/Image processing library. When I found an OpenCV port to the iPhone, and even someone tried to get it to do face detection, I just had to try it for myself.

In this post I'll try to run through the steps I took in order to get OpenCV running on the iPhone, and then how to get OpenCV's face detection play nice with iPhoneOS's image buffers and video feed (not yet OS 3.0!). Then i'll talk a little about optimization

Update: Apple officially supports camera video pixel buffers in iOS 4.x using AVFoundation, here's sample code from Apple developer.
Update: I do not have the xcodeproj file for this project, please don't ask for it. Please see here for compiling OpenCV for the iPhone SDK 4.3.

Let's begin

Cross compiling OpenCV on iPhoneOS

The good people @ computer-vision-software.com have posted a guideline on how to compile OpenCV on iPhone and link them as static libraries, and I followed it. I did have to recompile it with one change - OpenCV needed zlib linkage, and the OpenCV configure script wasn't able to config the makefiles to compile zlib as well. So I downloaded zlib from the net, and just added all the files to the XCode project to compile and link. If you're trying to recreate this, remember to configure/build zlib before adding the files to XCode so you get a zconf.h file. Now OpenCV linked perfectly.
All in all it was really not a big deal to compile OpenCV to the iPhoneOS. I imagined it will be much harder...

OK moving on to

Plain vanilla face detection

So the first step is to just get OpenCV to detect a single face on a single image. But let's make it harder and use UIImage.
So first, I took OCV's facedetect.c example, and added it to the project as is. Then I add 2 peripheral functions to setup and tear down the structs and allocated static memory (things that are done in the main function).

void init_detection(char* cascade_location) {
	cascade = (CvHaarClassifierCascade*)cvLoad( cascade_location, 0, 0, 0 );
	storage = cvCreateMemStorage(0);
}

static IplImage *gray = 0, *small_img = 0;

void release_detection() {
	if (storage)
    {
        cvReleaseMemStorage(&storage);
    }
    if (cascade)
    {
        cvReleaseHaarClassifierCascade(&cascade);
    }
	cvReleaseImage(&gray);
	cvReleaseImage(&small_img);
}

The detect_and_draw function remains exactly the same at this point. I just take the XML files of the haarcascades, and add them to the projects resources.
Now I initialize the detection structs from my UIView or UIViewController that will do the detection. The main NSBundle will find the path to the XML file:

NSString* myImage = [[NSBundle mainBundle] pathForResource:@"haarcascade_frontalface_alt" ofType:@"xml"];
		char* chars = (char*)malloc(512); 
		[myImage getCString:chars maxLength:512 encoding:NSUTF8StringEncoding];
		init_detection(chars);

Awesome, now let's face-detect already! For that all we need is to attach a picture of someone to the projects resources, load it, convert it to IplImage* and hand it over to detect_and_draw - simple.
I used a couple of helper function from the informative post I mentioned earlier:

- (void)manipulateOpenCVImagePixelDataWithCGImage:(CGImageRef)inImage openCVimage:(IplImage *)openCVimage;
- (CGContextRef)createARGBBitmapContext:(CGImageRef)inImage;
- (IplImage *)getCVImageFromCGImage:(CGImageRef)cgImage;
-(CGImageRef)getCGImageFromCVImage:(IplImage*)cvImage;

Now it's only putting it together:

IplImage* im = [self getCVImageFromCGImage:[UIImage imageNamed:"a_picture.jpg"].CGImage];
detect_and_draw(im);
UIImage* result = [UIImage imageWithCGImage:[self getCGImageFromCVImage:im]];

UIImageView* imv = [[UIImageView alloc] initWithImage:result];
[self addSubview:imv];
[imv release];

Just remember those externs, if you don't use a header file:

extern "C" void detect_and_draw( IplImage* img, CvRect* found_face );
extern "C" void init_detection(char* cascade_location);
extern "C" void release_detection();

Sweet. But detecting a face on a single photo is not so difficult - we want video and real-time face detection! So let's do that..

Tying it up with video feed from the iPhone camera (no OS 3.0 yet)

This step was so amazingly simple, it was borderline funny. I used my well-known camera frame grabbing code from Norio Numora. Of course to align it with OS 3.0 you must plug it in to the API Apple provide, and not this wily hack, but it's really a plug-and-play situation. I use it in many of my projects that use the iPhone camera, untill video on the OS 3.0 will be finalized.
So all I needed was to set everything up, make a timer to fire every so-and-so millisec, and send the frame to detection:

- (id)initWithNibName:(NSString *)nibNameOrNil bundle:(NSBundle *)nibBundleOrNil {
    if (self = [super initWithNibName:nibNameOrNil bundle:nibBundleOrNil]) {
        // Initialization code
		ctad = [[CameraTestAppDelegate alloc] init];
		[ctad doInit];
		
		NSString* myImage = [[NSBundle mainBundle] pathForResource:@"haarcascade_frontalface_alt" ofType:@"xml"];
		char* chars = (char*)malloc(512); 
		[myImage getCString:chars maxLength:512 encoding:NSUTF8StringEncoding];
		init_detection(chars);		
		
		[self.view addSubview:[ctad getPreviewView]];
		[self.view sendSubviewToBack:[ctad getPreviewView]];
		
		repeatingTimer = [NSTimer scheduledTimerWithTimeInterval:0.0909 target:self selector:@selector(doDetection:) userInfo:nil repeats:YES];
}

-(void)doDetection:(NSTimer*) timer {
	if([ctad getPixelData]) {
		if(!im) {
			im = cvCreateImageHeader(cvSize([ctad getVideoSize].width,[ctad getVideoSize].height), 8, 4);
		}
		cvSetData(im, [ctad getPixelData],[ctad getBytesPerRow]);
		CvRect r;
		detect_and_draw(im,&r);
		if(r.width > 0 && r.height > 0) {
			NSLog(@"Face: %.0f,%.0f,%.0f,%.0f",r.x,r.y,r.width.r.height);
		}
	}
}

See that for optimization sake, I only create the IplImage header once (the if goes in only in the first time), and every frame after that I only set the IplImage data by taking the buffer I got from the camera. This way the IplImage is sharing buffers, so there is also a little memory optimization there.
From that point on you can take it anywhere you like. Add stuff to faces, mark the face in the image, etc.

But... there's the issue of performance. This method will get you very very bad timings. In the area of 5-15 seconds (!!) for a single frame - which is horrendous. And I promised near real time performance. So without further ado,

Optimizing the hell out of the detection algorithm

Well the guys at computer-vision-software.com have done some work in the field of optimizing OpenCV's haar-based detection, but never released code. Their method was based on the fact that the iPhone's CPU can handle integers far better than floating-points, so they set out to change the algorithm to use integers. I also did that, and found that it only shaves off a few millisec of the total time. The far more influencing factor is the window size of the features scan, the scaling factor of the window size, and the derived number of passes.

Let me explain a little bit how the detection works in OpenCV. First you set the minimal size of the window. Then you specify a scale factor. OpenCV uses this scale factor to do multiple passes over the image to scan for feature-hits. It take the window size, say 30x30, and the factor, say 1.1, and starts multiplying the window size by the factor until it reaches the size of the image. So for a 256x256 image you get: 30x30 scan, 33x33, 36x36, 39x39, 43x43... 244x244 - a total of 23 passes, for one frame! This is way too much... This is done to get better and finer results, and it may be good for resource abundant systems, but this is not our case.

So first thing I did was slash down on those scans. There is, as expected a very strong impact on the quality of the results, but the times are getting close to acceptable. After all my optimizations I got the timing down to even ~120ms.
I optimized a few things:

  • The size of the input image, originally ~300x400, was cut down by 1.5
  • The scale factor for cvHaarDetectObjects: I played with values ranging from 1.2 to 1.5, with pleasing timings
  • The ROI (region of interest) in the IplImage to scan was set every frame to have the previous frame's detection, the location of the face, plus some buffer on the sides to allow movement of the face frame-to-frame. This decreases the scanned area from the whole image to just a small portion that contains the known face. Of course if a face was not found the ROI is reset.
  • I change the internal works of the cvHaarDetectObjects algorithm to do a lot less floats multiplications and turned them into integer multiplications.
  • I dawned upon me just the other day that I can also optimize the size of the search window, and not keep it constant from frame to frame (30x30). If the last frame had found a 36x36 face, the next detection should also try for a 36x36 object. I haven't tried it yet.
  • Memory optimization: don't alloc buffers every frame, share buffers, etc.

So first the most influential change, is in the detection phase:

void detect_and_draw( IplImage* img, CvRect* found_face )
{
	static CvRect prev;
	
	if(!gray) {
		gray = cvCreateImage( cvSize(img->width,img->height), 8, 1 );
		small_img = cvCreateImage( cvSize( cvRound (img->width/scale),
							 cvRound (img->height/scale)), 8, 1 );
	}

	if(prev.width > 0 && prev.height > 0) {
		cvSetImageROI(small_img, prev);

		CvRect tPrev = cvRect(prev.x * scale, prev.y * scale, prev.width * scale, prev.height * scale);
		cvSetImageROI(img, tPrev);
		cvSetImageROI(gray, tPrev);
	} else {
		cvResetImageROI(img);
		cvResetImageROI(small_img);
		cvResetImageROI(gray);
	}
	
    cvCvtColor( img, gray, CV_BGR2GRAY );
    cvResize( gray, small_img, CV_INTER_LINEAR );
    cvEqualizeHist( small_img, small_img );
    cvClearMemStorage( storage );

		CvSeq* faces = mycvHaarDetectObjects( small_img, cascade, storage,
										   1.2, 0, 0
										   |CV_HAAR_FIND_BIGGEST_OBJECT
										   |CV_HAAR_DO_ROUGH_SEARCH
										   //|CV_HAAR_DO_CANNY_PRUNING
										   //|CV_HAAR_SCALE_IMAGE
										   ,
										   cvSize(30, 30) );
		
	if(faces->total>0) {
		CvRect* r = (CvRect*)cvGetSeqElem( faces, 0 );
		int startX,startY;
		if(prev.width > 0 && prev.height > 0) {
			r->x += prev.x;
			r->y += prev.y;
		}
		startX = MAX(r->x - PAD_FACE,0);
		startY = MAX(r->y - PAD_FACE,0);
		int w = small_img->width - startX - r->width - PAD_FACE_2;
		int h = small_img->height - startY - r->height - PAD_FACE_2;
		int sw = r->x - PAD_FACE, sh = r->y - PAD_FACE;
		prev = cvRect(startX, startY, 
					  r->width + PAD_FACE_2 + ((w < 0) ? w : 0) + ((sw < 0) ? sw : 0),
					  r->height + PAD_FACE_2 + ((h < 0) ? h : 0) + ((sh < 0) ? sh : 0));
		printf("found face (%d,%d,%d,%d) setting ROI to (%d,%d,%d,%d)\n",r->x,r->y,r->width,r->height,prev.x,prev.y,prev.width,prev.height);
		found_face->x = (int)((double)r->x * scale);
		found_face->y = (int)((double)r->y * scale);
		found_face->width = (int)((double)r->width * scale);
		found_face->height = (int)((double)r->height * scale);
	} else {
		prev.width = prev.height = found_face->width = found_face->height = 0;
	}
}

As you can see I keep the previous face in prev, and use it to set the ROI of the images for the next frame. Note that the small_img is a scaled-down version of the input image, so the detection results must be scaled-up to match the real size of the input.

Now, I can bore you with the details of how I changed the cvHaarDetectObjects to use more integers, but I won't. Anyway it's all in the code, that is freely available, so you can diff it against cvHarr.cpp of OpenCV and see the changes. In short what I did was:

  • Mark out image scaling and canny pruning.
  • in the cvSetImagesForHaarClassifierCascade, which fires many times for each frame and is governed on scaling/shifting/rotating the Haar classifiers to get better detection, I changed the weights and sizes to be integers rather than floats.
  • in cvRunHaarClassifierCascade, which calculates the score for a single Haar feature-hit, I changed the results calculation to integers instead of floats.
  • I played around with integer oriented calculations of the sqrt function, that the cvRunHaarClassifierCascade func uses (fires many many times each frame), but that actually caused a slow-down on the device. Turns out the standard library (math.h) implementation is the best

Well guys, that's pretty much all my discovery in the field. Please keep working on it. I'm anxious to see a true real-time face detection on the iPhone.

Time for a video proof? you bet

Here's proof that all I wrote here is not total BS

Code

Code is as usual available in Google Code SVN repo:
http://code.google.com/p/morethantechnical/source/browse/#svn/trunk/FaceDetector-iPhone

OK, 'Till next time, enjoy
Roy.

Share
  • chiris

    hi, +1
    could u share a working xcode project....?
    thx

  • Paolo

    Hi Roy,

    I am a PhD student from the university of Modena and Reggio Emilia, Italy. I have cross-compiled opencv-2.0.0 on a smart camera board equipped with embedded linux. This board is based on the ARM PXA270 processor. I successfully tested the original opencv face detector. Now I would like use your improved face detector. I have cross compiled your code (mycvHaarDetectObjects.cpp) and statically linked with my main program and with opencv libs. My code calls your function mycvHaarDetectObjects(). The cross-compilation is successful but at run time the program aborts with this error:

    "OpenCV Error: Unspecified error (The node does not represent a user object (unknown type?)) in cvRead, file ../../opencv-2.0.0.int/src/cxcore/cxpersistence.cpp, line 4722
    terminate called after throwing an instance of cv::Exception'
    Aborted"

    This run time error happen when the main code try to load the file haarcascade_frontalface_alt2.xml. This is the statement : cascade = (CvHaarClassifierCascade*)cvLoad( cascade_name, 0, 0, 0 ); where cascade_name point to the filename string.

    The function call at the cvLoad() hangs because the system doesn't know the type related with the format of the file haarcascade_frontalface_alt2.xml.

    I have tried to fix this problem adding at your source file the code to register a new type : CvType haar_type() and related functions. I have picked up this code from the original cvhaar.cpp source file.

    Now the run time error seems to be fixed but the program doesn't detect any face.

    Please could you give me some tips about the workaround of this problem.

    Thanks in advance.

    Paolo

  • Paolo

    Hi Roy,

    I am a PhD student from the university of Modena and Reggio Emilia, Italy. I have cross-compiled opencv-2.0.0 on a smart camera board equipped with embedded linux. This board is based on the ARM PXA270 processor. I successfully tested the original opencv face detector. Now I would like use your improved face detector. I have cross compiled your code (mycvHaarDetectObjects.cpp) and statically linked with my main program and with opencv libs. My code calls your function mycvHaarDetectObjects(). The cross-compilation is successful but at run time the program aborts with this error:

    “OpenCV Error: Unspecified error (The node does not represent a user object (unknown type?)) in cvRead, file ../../opencv-2.0.0.int/src/cxcore/cxpersistence.cpp, line 4722
    terminate called after throwing an instance of cv::Exception’
    Aborted”

    This run time error happen when the main code try to load the file haarcascade_frontalface_alt2.xml. This is the statement : cascade = (CvHaarClassifierCascade*)cvLoad( cascade_name, 0, 0, 0 ); where cascade_name point to the filename string.

    The function call at the cvLoad() hangs because the system doesn’t know the type related with the format of the file haarcascade_frontalface_alt2.xml.

    I have tried to fix this problem adding at your source file the code to register a new type : CvType haar_type() and related functions. I have picked up this code from the original cvhaar.cpp source file.

    Now the run time error seems to be fixed but the program doesn’t detect any face.

    Please could you give me some tips about the workaround of this problem.

    Thanks in advance.

    Paolo

  • Starter

    Hi,
    First, thank you for your page!

    Where can we find your optimized version of cvHarr.cpp, beacause it's not in your svn in FaceDetector-iPhone?

    Thank you

  • http://missionbird.com Kelly

    Hi Roy,
    Your work and tutorial save a lot of our development time. Do you think we can purchase your code with the optimization technique?

    I could arrange payment by freelancer.com: I know it is not about money; however, your great work would really help us a lot.

    https://www.freelancer.com/projects/Mobile-Phone-iPhone/iphone-app-face-recognition.html

    Thanks again..
    kelly

  • http://www.morethantechnical.com Roy

    @Kelly, I can't recover the xcodeproj for this old project anymore... however all the code including optimization is in the repository.
    It's only a matter of setting up a new iOS project and adding all the files into it.

  • jaspreet

    help me sir is there any possibility to store the detected images and this detected image is to be reconginise

  • http://www.morethantechnical.com Roy

    @jaspreet
    Of course, you can simply invoke the "imwrite" function, or use iOS APIs for saving images...

  • Sunny

    Hi there!

    First of all, great job on doing face detection with openCV on iOS.

    I'm trying to accomplish face "recognition" in my iphone app. I know there are APIs available such as the face.com API. Do you know what the openCV support is for face recognition? Thanks

    Sunny

  • http://www.morethantechnical.com Roy

    @Sunny

    You can implement an EigenFaces method pretty easily with OpenCV (using the PCA functions), but I've recently seen a FischerFaces code w/ OpenCV as well (see the willowgarage page: http://opencv.willowgarage.com/wiki/FaceRecognition)

    You can also use some higher order machine learning tools in OpenCV like SVMs and decision trees...

  • Pingback: What are some face detection (not recognition) algorithms suitable for limited (embedded) processors? | CL-UAT()