Categories
graphics programming video work

iPhone camera frame grabbing and a real-time MeanShift tracker

i_can_has_meanshiftHi
Just wanted to report on a breakthrough in my iPhone-CV digging. I found a true realtime frame grabber for the iPhone preview frame (15fps of ~400×300 video), and successfully integrated this video feed with a pure C++ implementation of the MeanShift tracking algorithm. The whole setup runs at realtime, under a few constraints of course, and gives nice results.
Update: Apple officially supports camera video pixel buffers in iOS 4.x using AVFoundation, here’s sample code from Apple developer.
So lets dig in…
First up is the frame grabber.
I was truly amazed by how far people went with getting the video stream of the iPhone camera. NDAs were tossed out the window, Private Frameworks are hardly private anymore, and people went further on. The impl I found hooks up onto the private callback function of the camera to PLCameraController, mind you this was probably hidden deep down in PhotoLibrary.
The gifted hacker, Norio Nomura, published his code for doing this here. I found on a long thread of rants about how the only way to grab frames off the camera preview was using UIGraphics’s UIGetScreenImage(). He sure showed them…
Anyway, his code led me to a few tweaks needed to process the frame data in real-time. What he does is update his static buffer pointer, readpixels, to the CoreSurface Surface object’s buffer, so you always have a pointer to the pixel data. Sweet. But now we need to process this data somehow, and moreover, we need to know the width, height, bytes per pixel and bytes per row.
So I created a few more static variables, and added a few functions in CameraTestAppDelegate to return these values. Now I can access the pixels correctly.
On to the MeanShift.
The algorithm is very simple:

  1. Calculate the histogram of the object (you assume the user had marked it in the first frame)
  2. In the next frame, for which you don’t know where the object is, calculate the average point of all the pixels that “conform best” with your histogram, in a certain area around the known center.
  3. Move iteratively towards your average point, until convergence.

I found a pure c++ implementation of a version of MeanShift from a turkish student in Bilkent University, Halil Cuce. It works perfectly for my needs, and compiles out-of-the-box as it has no external libraries dependencies.
It has room for a few fixes though:

  • CObjectTracker::CheckEdgeExistance needs to use “GetBValue” instead of “GetGValue” in the last additives.
  • CObjectTracker::GetPixelValues should not use m_nImageWidth as the “number of bytes in a row”, a new member should be introduced which represents the row-stepping.
  • Many optimizations can be done, for example in the CObjectTracker::FindHistogram function, the for loops can be optimized by pre-calculating the start and end values of Y and X iterators.

But all-in-all it gives nice results. The main problem is high CPU rates whne tracking large objects. I found that 40×40 objects can be tracked in ~15fps, but anything larger than that will severely damage fps rates.
OK, time for a video:

As usual… I cannot release my portions of the code as-is in here as it is protected by my company, I can only refer you to the online sources available. Contact me by commenting or mail if you want code.
Enjoy,
Roy.

27 replies on “iPhone camera frame grabbing and a real-time MeanShift tracker”

Hi. I also work on Nomura’s code to get preview video data from iPhone’s camera. Could you answer my questions ?
1. Accessing to raw data
Converting the raw data to RGBA (with capturePreviewWithInstalledHook funtion in his code) caused much delay in my code and it is quite bad for real-time applications.
I think the variable ‘readblePixels’ contains raw pixel format which is copyed in the callback __camera_callbackHook. Using the raw data may be better for performance.
But I’m not sure what the raw data format is.
2. The size of preview video : you mentioned that the video size is 300×400. But I found that the variables, width and height, in the __camera_callbackHook function becomes width = 304 and height = 400 when I debugged the code.
Do you know why the width becomes 304 ??

Hi Roy,
I’m reading your article, and I’m interesting in your code…
Can you show or send it to me (even some portions would be nice)? If you can’t I’ll not ask again…
Thanks,
Daniele

Hi Daniele,
Thanks for the interest in the code.
The iPhone camera frame grabbing code you can get from my latest Augmented Reality project. Check it out from the google code project, under the directory NyARToolkit-iPhone. It has a working version of the frame grabber, including all code.
For meanshift, you can take the code from the work I cited in the article, it compiles out-of-the-box on iPhone OS (it’s a pure c++ implementation w/o dependencies).
Good luck
Roy

Thanks… very much for this info.
I downloaded projects from Norio Nakamura and tried them but none of them run correctly.
When I try so use the undocumented methods the application crashes
Do the apps still work on iphone OS 3.1?
Thanks in advance

Hi Ignacio
The code will not work in OS 3.X, probably because of internal API changes.
The code relies on pointers hidden inside internal structs, so if the structs change by a bit the code cannot get the required pointers (have a look in CameraTestAppDelegate.m).
I am planning on working this issue out when I have the time. But anyway, this method of meddling with private APIs adn frameworks is unapproved by Apple, so any application using this method will probably not be accepted in the app store.
Keep checking out the blog to see if anything changes..
Roy.

For now I don’t mind if my app is rejected, this is part of a research actually.
I would appreciate your post when you accomplish it.
I am seeing the blog very frecuently!
thanks!

Hi there. I’ve been looking at the AR Code for a little while and yes, OS 3 is causing a lot of headaches with grabbing the video. I’ve been looking into it but I can’t even register the callback for the video loop let-alone get actual video to the AR Library. Im not sure what has change but something definitely has.

Roy, are you able to share your code? I am doing some objects recognition work for my master’s thesis and would be interested in seeing how you got everything working. Thanks a bunch!

Hi Roy,
looks very good. I’m currently working on the same topic (object tracking), but i got stuck at the very beginning.
UIGetScreenImage works fine, but i can’t get rid of the cameraOverlayView (which shows UI Elements I need).
I don’t want to have any code, maybe a hint would be great.
You are drawing a red rectangle onto the screen. How do you prevent it from beeing snapshotted? (which would mess up the image analysis, i suggest)
I tried to use setHidden:YES or setAlpha:0 or even positioning the view out of the bounds of the screen while using UIGetScreenImage. But it seems to be too slow.
thanks a lot
peter

If only I knew how…. you bet I wouldv’e told the world!
But alas, I dont. And my tinkering about the 3.X API have not yeilded any results, so I’m kinda waiting aroud to see who gets it done and just point to that guy.
I am confident it’s possible, I mean you got the QiK app that does it and it’s apple approved, so it’s possible – and possibly even legal.
So good luck finding the solution and don’t forget to drop a line when you find it! I would be very happy to hear it
Roy.

Going at it with UIGetScreenImage is probably not the solution for real-time… maybe for post-processing, where you capture a clip, process it and then display it.
I did have a thought about modulation, where you take a frame, analyse, display reault – w/o taking another frame, and only then take another frame. But it seems dumb.
Another possibility is to capture only part of the screen, and use the other part for displaying results. This way the results area can be disregarded, even though it was captured… (you exactly where your results display is)
Anyway, the best thign to do is get the frames right off the camera buffers. I’m still waiting for code that does it in OS 3.X.
Roy.

Hi Roy,
Very nice post.
I remember having struggled quite some time ago on a similar issue on Windows Mobile 5.0 reimplementing a class missing in the .NET compact framework.
Can I have your source code?
By the way: your link ‘http://www.google.co.il/url?q=http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/mean_shift/mean_shift.ppt&ei=i20ASsjMIMyxsgabxZCYBg&sa=X&oi=spellmeleon_result&resnum=1&ct=result&usg=AFQjCNG6_g7Cw5-rGAnO13MuuAVBgejtCQ’ does not seem to be valid anymore.
Jerome.

Hi Roy,
I am new to iPhone developmet and doing a POC for video streaming. I am not getting the logic for “how to go ahead for streaming the video”. Some how reached upto the level of getting th screen shots by using UIGetScreenImage. But after that not able to procees at all.
Pls help. email:[email protected]
Thanks,
Susanta

Hi Roy
i am very glad to see you blog.
now i am developing iphone app about track in camera.

Hello!
I’m glad that i read your blog as i’m doing my Final Year Project in school.
I would like to make a real time capture application. There is a part related to your project.
I wish to read on your program. Thank you so much!!!
Best Regards,
Godfrey

Hey, stumbled across this and would really like to take a look at source code in help with an app I am playing with. Is there anyway to get the source code from you still?

Hi.
I’m studying object tracking.
I’m a student.
I would like the source code for this.
Thanks.

Hi.
I’m studying object tracking.
I’m a student. Now i am developing iphone app about track in camera
I would like the source code(online source code) for this.
Thanks.

hi Roy,
now i am developing one iphone app about object tracking.
i would like the source code. could you email to me one?
Thanks very much

Leave a Reply

Your email address will not be published. Required fields are marked *