iPhone camera frame grabbing and a real-time MeanShift tracker


Just wanted to report on a breakthrough in my iPhone-CV digging. I found a true realtime frame grabber for the iPhone preview frame (15fps of ~400x300 video), and successfully integrated this video feed with a pure C++ implementation of the MeanShift tracking algorithm. The whole setup runs at realtime, under a few constraints of course, and gives nice results.

Update: Apple officially supports camera video pixel buffers in iOS 4.x using AVFoundation, here's sample code from Apple developer.

So lets dig in...

First up is the frame grabber.

I was truly amazed by how far people went with getting the video stream of the iPhone camera. NDAs were tossed out the window, Private Frameworks are hardly private anymore, and people went further on. The impl I found hooks up onto the private callback function of the camera to PLCameraController, mind you this was probably hidden deep down in PhotoLibrary.

The gifted hacker, Norio Nomura, published his code for doing this here. I found on a long thread of rants about how the only way to grab frames off the camera preview was using UIGraphics's UIGetScreenImage(). He sure showed them...

Anyway, his code led me to a few tweaks needed to process the frame data in real-time. What he does is update his static buffer pointer, readpixels, to the CoreSurface Surface object's buffer, so you always have a pointer to the pixel data. Sweet. But now we need to process this data somehow, and moreover, we need to know the width, height, bytes per pixel and bytes per row.

So I created a few more static variables, and added a few functions in CameraTestAppDelegate to return these values. Now I can access the pixels correctly.

On to the MeanShift.

The algorithm is very simple:

  1. Calculate the histogram of the object (you assume the user had marked it in the first frame)
  2. In the next frame, for which you don't know where the object is, calculate the average point of all the pixels that "conform best" with your histogram, in a certain area around the known center.
  3. Move iteratively towards your average point, until convergence.

I found a pure c++ implementation of a version of MeanShift from a turkish student in Bilkent University, Halil Cuce. It works perfectly for my needs, and compiles out-of-the-box as it has no external libraries dependencies.

It has room for a few fixes though:

  • CObjectTracker::CheckEdgeExistance needs to use "GetBValue" instead of "GetGValue" in the last additives.
  • CObjectTracker::GetPixelValues should not use m_nImageWidth as the "number of bytes in a row", a new member should be introduced which represents the row-stepping.
  • Many optimizations can be done, for example in the CObjectTracker::FindHistogram function, the for loops can be optimized by pre-calculating the start and end values of Y and X iterators.

But all-in-all it gives nice results. The main problem is high CPU rates whne tracking large objects. I found that 40x40 objects can be tracked in ~15fps, but anything larger than that will severely damage fps rates.

OK, time for a video:

As usual... I cannot release my portions of the code as-is in here as it is protected by my company, I can only refer you to the online sources available. Contact me by commenting or mail if you want code.



  • wlee

    Hi. I also work on Nomura's code to get preview video data from iPhone's camera. Could you answer my questions ?

    1. Accessing to raw data

    Converting the raw data to RGBA (with capturePreviewWithInstalledHook funtion in his code) caused much delay in my code and it is quite bad for real-time applications.
    I think the variable 'readblePixels' contains raw pixel format which is copyed in the callback __camera_callbackHook. Using the raw data may be better for performance.
    But I'm not sure what the raw data format is.

    2. The size of preview video : you mentioned that the video size is 300x400. But I found that the variables, width and height, in the __camera_callbackHook function becomes width = 304 and height = 400 when I debugged the code.
    Do you know why the width becomes 304 ??

  • Hi Roy,
    I'm reading your article, and I'm interesting in your code...
    Can you show or send it to me (even some portions would be nice)? If you can't I'll not ask again...


  • Roy

    Hi Daniele,

    Thanks for the interest in the code.

    The iPhone camera frame grabbing code you can get from my latest Augmented Reality project. Check it out from the google code project, under the directory NyARToolkit-iPhone. It has a working version of the frame grabber, including all code.
    For meanshift, you can take the code from the work I cited in the article, it compiles out-of-the-box on iPhone OS (it's a pure c++ implementation w/o dependencies).

    Good luck

  • Ignacio

    Thanks... very much for this info.
    I downloaded projects from Norio Nakamura and tried them but none of them run correctly.
    When I try so use the undocumented methods the application crashes
    Do the apps still work on iphone OS 3.1?

    Thanks in advance

  • Roy

    Hi Ignacio
    The code will not work in OS 3.X, probably because of internal API changes.
    The code relies on pointers hidden inside internal structs, so if the structs change by a bit the code cannot get the required pointers (have a look in CameraTestAppDelegate.m).
    I am planning on working this issue out when I have the time. But anyway, this method of meddling with private APIs adn frameworks is unapproved by Apple, so any application using this method will probably not be accepted in the app store.

    Keep checking out the blog to see if anything changes..

  • Ignacio

    For now I don't mind if my app is rejected, this is part of a research actually.
    I would appreciate your post when you accomplish it.
    I am seeing the blog very frecuently!

  • Ben

    Hi there. I've been looking at the AR Code for a little while and yes, OS 3 is causing a lot of headaches with grabbing the video. I've been looking into it but I can't even register the callback for the video loop let-alone get actual video to the AR Library. Im not sure what has change but something definitely has.

  • Jason

    Roy, are you able to share your code? I am doing some objects recognition work for my master's thesis and would be interested in seeing how you got everything working. Thanks a bunch!

  • peter

    Hi Roy,
    looks very good. I'm currently working on the same topic (object tracking), but i got stuck at the very beginning.
    UIGetScreenImage works fine, but i can't get rid of the cameraOverlayView (which shows UI Elements I need).

    I don't want to have any code, maybe a hint would be great.

    You are drawing a red rectangle onto the screen. How do you prevent it from beeing snapshotted? (which would mess up the image analysis, i suggest)

    I tried to use setHidden:YES or setAlpha:0 or even positioning the view out of the bounds of the screen while using UIGetScreenImage. But it seems to be too slow.

    thanks a lot

  • prs

    Thanks for the great post!
    I am trying to grab the frames from the video feed, but seems like this code does not work on OS 3.x .
    What is the best option available for the latest iPhone OS ? Any ideas will be appreciated. I just want to use it for research purposes.



  • Roy

    If only I knew how.... you bet I wouldv'e told the world!
    But alas, I dont. And my tinkering about the 3.X API have not yeilded any results, so I'm kinda waiting aroud to see who gets it done and just point to that guy.
    I am confident it's possible, I mean you got the QiK app that does it and it's apple approved, so it's possible - and possibly even legal.

    So good luck finding the solution and don't forget to drop a line when you find it! I would be very happy to hear it

  • Roy

    Going at it with UIGetScreenImage is probably not the solution for real-time... maybe for post-processing, where you capture a clip, process it and then display it.

    I did have a thought about modulation, where you take a frame, analyse, display reault - w/o taking another frame, and only then take another frame. But it seems dumb.

    Another possibility is to capture only part of the screen, and use the other part for displaying results. This way the results area can be disregarded, even though it was captured... (you exactly where your results display is)

    Anyway, the best thign to do is get the frames right off the camera buffers. I'm still waiting for code that does it in OS 3.X.


  • Dave

    How do you convert your frame to UBYTE8?

  • Jerome

    Hi Roy,

    Very nice post.
    I remember having struggled quite some time ago on a similar issue on Windows Mobile 5.0 reimplementing a class missing in the .NET compact framework.

    Can I have your source code?

    By the way: your link 'http://www.google.co.il/url?q=http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/mean_shift/mean_shift.ppt&ei=i20ASsjMIMyxsgabxZCYBg&sa=X&oi=spellmeleon_result&resnum=1&ct=result&usg=AFQjCNG6_g7Cw5-rGAnO13MuuAVBgejtCQ' does not seem to be valid anymore.


  • Susanta

    Hi Roy,

    I am new to iPhone developmet and doing a POC for video streaming. I am not getting the logic for "how to go ahead for streaming the video". Some how reached upto the level of getting th screen shots by using UIGetScreenImage. But after that not able to procees at all.

    Pls help. email:xchangingiphonedeveloper@gmail.com


  • John DeWeese

    Hey all, haven't seen any progress on this front for a while, especially after OS3 breakage. A friend and I ended up writing a new hacky method, which I've described here:



  • Spring

    Hi Roy
    i am very glad to see you blog.
    now i am developing iphone app about track in camera.

  • Tariq


    You have done really great job. Sir can you please mail me running code ? I want to learn a lot from your code.

    Thanks & Regards,

  • godfrey


    I'm glad that i read your blog as i'm doing my Final Year Project in school.
    I would like to make a real time capture application. There is a part related to your project.
    I wish to read on your program. Thank you so much!!!

    Best Regards,

  • Dave

    i would like the source code for this.


  • Hey, stumbled across this and would really like to take a look at source code in help with an app I am playing with. Is there anyway to get the source code from you still?

  • Jeong

    I'm studying object tracking.
    I'm a student.

    I would like the source code for this.


  • Jeong


    I’m studying object tracking.
    I’m a student. Now i am developing iphone app about track in camera

    I would like the source code(online source code) for this.


  • Haven

    hi Roy,

    I would like the source code, can you email to me one?

    Thanks very much.

  • Sunred

    I would like the source code, thx.

  • Harry

    hi Roy,
    now i am developing one iphone app about object tracking.
    i would like the source code. could you email to me one?
    Thanks very much

  • Roy

    sorry, I don't have the code anymore (lost over the years)
    You can now much simpler ways to do the same thing, for example see: https://github.com/aptogo/FaceTracker