Just wanted to report on a breakthrough in my iPhone-CV digging. I found a true realtime frame grabber for the iPhone preview frame (15fps of ~400×300 video), and successfully integrated this video feed with a pure C++ implementation of the MeanShift tracking algorithm. The whole setup runs at realtime, under a few constraints of course, and gives nice results.
Update: Apple officially supports camera video pixel buffers in iOS 4.x using AVFoundation, here’s sample code from Apple developer.
So lets dig in…
First up is the frame grabber.
I was truly amazed by how far people went with getting the video stream of the iPhone camera. NDAs were tossed out the window, Private Frameworks are hardly private anymore, and people went further on. The impl I found hooks up onto the private callback function of the camera to PLCameraController, mind you this was probably hidden deep down in PhotoLibrary.
The gifted hacker, Norio Nomura, published his code for doing this here. I found on a long thread of rants about how the only way to grab frames off the camera preview was using UIGraphics’s UIGetScreenImage(). He sure showed them…
Anyway, his code led me to a few tweaks needed to process the frame data in real-time. What he does is update his static buffer pointer, readpixels, to the CoreSurface Surface object’s buffer, so you always have a pointer to the pixel data. Sweet. But now we need to process this data somehow, and moreover, we need to know the width, height, bytes per pixel and bytes per row.
So I created a few more static variables, and added a few functions in CameraTestAppDelegate to return these values. Now I can access the pixels correctly.
On to the MeanShift.
The algorithm is very simple:
- Calculate the histogram of the object (you assume the user had marked it in the first frame)
- In the next frame, for which you don’t know where the object is, calculate the average point of all the pixels that “conform best” with your histogram, in a certain area around the known center.
- Move iteratively towards your average point, until convergence.
I found a pure c++ implementation of a version of MeanShift from a turkish student in Bilkent University, Halil Cuce. It works perfectly for my needs, and compiles out-of-the-box as it has no external libraries dependencies.
It has room for a few fixes though:
- CObjectTracker::CheckEdgeExistance needs to use “GetBValue” instead of “GetGValue” in the last additives.
- CObjectTracker::GetPixelValues should not use m_nImageWidth as the “number of bytes in a row”, a new member should be introduced which represents the row-stepping.
- Many optimizations can be done, for example in the CObjectTracker::FindHistogram function, the for loops can be optimized by pre-calculating the start and end values of Y and X iterators.
But all-in-all it gives nice results. The main problem is high CPU rates whne tracking large objects. I found that 40×40 objects can be tracked in ~15fps, but anything larger than that will severely damage fps rates.
OK, time for a video:
As usual… I cannot release my portions of the code as-is in here as it is protected by my company, I can only refer you to the online sources available. Contact me by commenting or mail if you want code.