Mar 16

Curve tracking with a Heap&Hogg's Particle Filters [w/ code, OpenCV]

Screen Shot 2013-03-15 at 6.18.30 PM I wanna share some code for 2D curve tracking with a particle filter, implementing the body of work of Tony Heap and David Hogg. These guys presented a relatively easy to implement method for tracking deformable curves through space and change in form using a Hierarchical Point Distribution Models (HPDM), which is another elegant way to store shape priors. Granted, it is not perfect, but for a simple 2D shape like a hand it works pretty good, and rather fast as well.
Let's dive in then,

Training the HPDM

An HPDM is basically a Point Distribution Model (PDM) that has multiple levels - a hierarchy. It encodes the deformation a shape can go through, usually by means of dimension reduction (in our case a PCA). In the Hierarchical PDM we generate a shape-deformation space by PCA, then discretize that space using clustering (like k-means, but an Expectation-Maximization operation works just as well) to achieve another level in the hierarchy.
To train the HPDM we need (many) samples of curves. I got the samples automatically using a Kinect, blob detection, curve extraction etc., but that wasn't the end of it. The samples needed a lot of clean-up!
The raw samples contain the whole arm, which is not something I'd like to track and also it throws off the alignment of all the curves because the variance is big.
hand curve from kinect
So I clean it up a bit by looking a bit further down the curve. But since we're doing an ad-hoc curve extraction from the blob, simply "capping" the curves at a certain point along the curve creates curves that are very badly aligned. For an ideal alignment we need them to be the same exact length (in points), and we need the points to be distributed along the shape in the same way, e.g. point #120 lands on the tip of the index finger in all curves.
Screen Shot 2013-03-11 at 2.13.42 PMScreen Shot 2013-03-11 at 2.13.48 PM
Here are 10 curves with the 100th point marked on them, we can see how most of the 100th points align, but some are not even close.
Screen Shot 2013-03-11 at 2.57.03 PM
Since we're building a PDM, we model the variation of a single point in the curve (like the 100th point), so we need the variation to be small. We therefor look for an alignment.
We start by picking an initial "mean shape", which in the first iteration is simply the average curve from all samples. But as we've seen a single point can vary greatly so the initial average shape does not reflect the true mean shape.
We proceed by examining each curve, matching vs. the mean shape, adding points at the start or end of the curve so they would align. Now the "100th point test" yields better results, the 100th points are roughly at the same position.
Screen Shot 2013-03-11 at 3.53.41 PM
We proceed by performing PCA and then k-means. In the papers there is a discussion around choosing the k, however I used k = \frac{E}{d} where E is the number of curves in the DB (I had ~1300) and d = 50 (I also experimented with higher ds but they do not span the space too well). Here are the found centers:
Screen Shot 2013-03-11 at 6.13.48 PM
After k-means-ing, the paper then continues to say that the "subregions" (sometimes called "patches") of the shape deformation space should be local, and thus hopefully linear.
They do not use the samples that were attached to each center (that's how k-means clustering works, attached samples to centers thus clustering the samples in space), but rather they look for nearest neighbors in the shape space, to get an overlapping area between subregions / patches.
Then they perform PCA on each bunch of NNs around each center to get a linear patch.
Here are the patches with slight deformation on the principal axis:
Screen Shot 2013-03-12 at 9.00.08 AM
The core parameters to tweak in the HPDM are:

  • k the number of patches, which should be derived from another two parmeters:
  • d (the divisor), which along with E (the number of samples) determines how many samples you have per patch
  • O (the overlap) which determines how many nearest neighbors we're looking for per each patch (usually this number is bigger than E/d and therefore the patches overlap)
  • the size of the shape space - the dimension of the first PCA, the number of principal components to keep. I had it at 20
  • the nubmer of principal components to keep in the subregions, I had it at 1

Using the HPDM in a Particle Filter

A quick reminder, Particle Filters (PF) are a way to maintain many different hypotheses for the model (e.g. position of an object, shape, signal, etc.) looking for an unknown truth state of the model while receiving noisy measurements.
In our case we're looking at the position, orientation and shape of a 2D hand object, so each particle is going to hold data that will describe such a model.

class Particle {
	//according to H&H section 3.1
	Point2f centroid;
	float scale;
	float orientation;
	Mat_<float> deformation;
	int patch_num;

To employ the PF we need to learn something about the shape - how it changes over time. The PF (a stochastic Markovian process) is dependent of the last iteration to determine the next, we therefore need to extract a shape variation model using the HPDM.
So we go back to the samples, this time looking in chronological order for the relationship between the shape in one frame and the next frame. This can give us an idea of how the shape changes through time, and thus can be used as a prior for the PF.
To that end we construct a transition matrix, one that tells us what is the probability of transforming from shape described by patch i to a shape described by patch j. Easily we just go over the curves sequentially and accumulate the transitions. Most of the frame-to-frame transitions are going to remain in the same patch...
Screen Shot 2013-03-13 at 3.45.16 PM
In the end we get a transition matrix:
Screen Shot 2013-03-13 at 3.47.24 PM
But we also need a cumulative version of it (accumulating per row) that we will use in the particle filter.

Building the particle filter

Particle filters work by assigning a score for each particle on how well it performs in this iteration, so that in later iterations we sample new particles that will partly preserve this strength but also be open for new hypotheses.
To evaluate how good a particle is we examine it's adherence to edges in the image. Since I am working with a binary mask from the Kinect it has very strong and clear edges, but this method works the same way for regular images.
Screen Shot 2013-03-13 at 4.02.44 PM
The red line is the hypothesized shape in that particle, the cyan circles mark if an edge was hit along the normal to the curve.
Scores are given based on how many edges were hit, and if they are pointing in the right direction (i.e. the edge is aligned with the curve).

Normalizing weights, resampling

The standard PF part is normalizing weights of the current particles, and create the cumulative sum which is essential for easily choosing a

//sum all weights
Scalar s = sum(unnormalized_weights);
//calc normalized weights by dividing, and build cumulative distribution function
for (int i=0; i<unnormalized_weights.size(); i++) {
	normalized_weights[i] = unnormalized_weights[i] / s[0];
	if (i==0) {
		cumulative_sum[i] = normalized_weights[i];
	} else {
		cumulative_sum[i] = cumulative_sum[i-1] + normalized_weights[i];

We then resample new particles form the distribution we created with the current bunch of particles and the cumulative sum (this is called Sequential importance resampling):

void resampleParticles() {
	//draw N particles from the new distribution function (the importance density)
	vector<Particle> new_particles(num_particles);
	for (int i=0; i<particles.size(); i++) {
		while (true) {
			float uni_rand = rng_.uniform(0.0f,1.0f); //randomize a number from [0,1]

			//binary search for position in cumulative sum
			vector<float>::iterator pos = lower_bound(cumulative_sum.begin(), cumulative_sum.end(), uni_rand);
			int ipos = distance(cumulative_sum.begin(), pos);
			if(particles[ipos].scale < 0.85) continue; //re-sample if scale is bad
			new_particles[i].scale = particles[ipos].scale;
			new_particles[i].orientation = particles[ipos].orientation;
			new_particles[i].centroid = particles[ipos].centroid;
			new_particles[i].patch_num = particles[ipos].patch_num;
	particles.clear(); particles.insert(particles.begin(),new_particles.begin(),new_particles.end());		

Local optimization of particles

In their paper Heap&Hogg suggest we combing a few local optimization iterations with each step of the PF, and claim (rightfully) that it results in faster convergence and better results. The meaning of "local" means that we run it on the same frame a number of times, inside one iteration of the PF.
To help the particles attract to the edges in the image, I try to find a rigid transformation that will decrease the energy of attraction to edges. This is a risky operation as it can throw the curve completely out of balance if the initial guess is no good.
The rigid transformation is found by Kabsch algorithm.
We can also consider non-rigid transformations by looking at the patch deformation axes. I found this method not to be of great use, plus they add a ton of computation that hinders performance.


To make the PF function we need to "propagate" the error through the particles. Basically meaning we need to put in some randomness in the process or else it quickly converges and gets stuck.
I therefore use the transition matrix to see where each particle-shape likes to go (or, is likely to go). Actually the cumulative transition matrix:
Screen Shot 2013-03-15 at 4.46.47 PM
Each row is the cumulative distribution function that tells what is the probablility of transitioning from prototypical-shape (patch) i to prototypical-shape (patch) j. I use that to randomly pick a target prototypical shape for each particle.

void propagate() {
	//according to section 3.2
	int changed = 0;
	for (int i=0; i<particles.size(); i++) {
		float uni_rand = rng_.uniform(0.0f,1.0f);

		//binary search for new subregion (patch) in Cumulative Transition Matrix
		vector<double> row_a_in_CumulativeMat(Cumulative.cols); 
		int old_patch_num = particles[i].patch_num;
		vector<double>::iterator pos = upper_bound(row_a_in_CumulativeMat.begin(), row_a_in_CumulativeMat.end(), uni_rand);
		int new_patch_num = distance(row_a_in_CumulativeMat.begin(), pos);
//		cout << "particle "<<i<<" ("<<uni_rand<<") ["<<particles[i].patch_num<<"] -> ["<<new_patch_num<<"]\n";
		if (new_patch_num != old_patch_num) {
			//moved to a new patch ("leap through wormhole"): take the mean deformation of that patch
			particles[i].patch_num = new_patch_num;
		//TODO: particle should not go out of bounds or be too small/big or orientate too much
	cout << changed << " particles changed, " << (num_particles-changed) << " stayed\n";

Each particle then is added some random noise so the filter will not converge and actually be able to cope with movement and deformation of the shape.

Getting the best guess of the PF

Simply by looking at all the particles at a given moment, we can take the best ones and average over them.

void calculateMeanShape() {
	Particle p;
        p.scale = p.orientation = p.centroid.x = p.centroid.y = 0;

        //Take the particles that score the highest
        float N_precentile = *(max_element(normalized_weights.begin(),normalized_weights.end())) * 0.95;

	float sum_w = 0.0;
	for (int i=0; i<num_particles; i++) {
		if (normalized_weights[i] < N_precentile) {
		p.orientation += particles[i].orientation * normalized_weights[i];
		p.scale += particles[i].scale * normalized_weights[i];
		p.centroid += particles[i].centroid * normalized_weights[i];
		addWeighted(p.deformation, 1.0, particles[i].deformation, normalized_weights[i], 0.0, p.deformation);
		sum_w += normalized_weights[i];
	p.orientation /= sum_w;
	p.scale /= sum_w;
	p.centroid *= 1.0/sum_w;
	p.deformation /= sum_w;

	mean_shape = getShapeForParticle(p);

Here is the filter running with all particles showing (40 particles). The yellow shape is the highest ranking shape, the pink shape is the average of the highest ranking shapes.

Code and Usage

You can grab the code with a usage example here: http://web.media.mit.edu/~roys/src/HHParticleFilter.zip


  • Istvan

    Hi Rob,

    very nice work. I've been watching your blog for a while and I like what you do. My question is, if this method works better then ASM.

    I am working on an ASM solution for face recognition, but I have some bugs, and I don't really have the time to fine-tune the algorithm.

    Best Regards,

  • http://www.morethantechnical.com Roy

    1st, my name is Roy.
    2nd, thank you.
    3rd, This method is different from ASM although it does have some similarities in modeling shape deformation for tracking. But it is different and may be better or worse for different application.
    If you wish to learn, find the time to really understand the algorithm you're implementing, that will bring you the most benefit.
    If you want a ready-made solution for facial features tracking please see Jason Saragih & Kyle McDonald's library: https://github.com/kylemcdonald/ofxFaceTracker

  • Istvan

    Hi Roy,

    sorry for renaming you. Yesterday I was a bit tired, I've just finished a new game for my company.

    So, I really loved image processing in college, actually my graduation paper was written from this field.


    But after I finished college I couldn't find any jobs in Romania in this field, so I became a game programmer. In my spare time I usually dive in some image processing stuff, trying to figure out how would I integrate it in games, maybe with some AR too.

    I usually read a lot of papers, trying to understand the thought process actually. Why is the author using for ex. PCA here, SVM there and so on.

    But the problem for me is, after 8-9 hours of programming at the company I just can't get the power to do 2-3 hours more.

    Thanks for the advice, and hope to see new stuff from you.

    Best regards,

  • berak

    wow nice, i had to try it! (though most of it is way over my head)
    encountered a little bug:
    element 39 in the HPDMTracker.xml is all zeros, and the tracker can't find a similar region for such things and crashes