Feb 05

Using Hidden Markov Models for staff line removal (in OMR) [w/code]

So lately I'm into Optical Music Recognition (OMR), and a central part of that is doing staff line removal. That is when you get rid of the staff lines that obscure the musical symbols to make recognition much easier. There are a lot of ways to do it, but I'm going to share with you how I did it (fairly easily) with Hidden Markov Models (HMMs), which will also teach us a good lesson on this wonderfully useful approach.

OMR has been around for ages, and if you're interested in learning about it [Fornes 2014] and [Rebelo 2012] are good summary articles.
The matter of Staff Line Removal has occupied dozens of researchers for as long as OMR exists; [Dalitz 2008] give a good overview. Basically the goal is to remove the staff lines that obscure the musical symbols, so they would be easier to recognize.
Screen Shot 2015-01-24 at 10.11.00 PM
But, the staff lines are connected to the symbols, so simply removing them will cut up the symbols and make them hardly recognizable.
So let's see how we could do this with HMMs.

HMMs for staff line removal

First step is to know where the staff lines are, but for the purpose I'll assume we already know that, and anyway that's a topic for another post.
If you know where the staff lines are, you can think about traversing each line from left to right and noticing that its width changes from being narrow where there are no symbols and very wide where there are. This sequential nature of the lines immediately made me think about Hidden Markov Models, which follow this idea exactly. HMMs model the sequential probabilistic behavior of a string of observations (there are definitely more formal ways to put this), so think of each point on the line as a point of observation where the width is the measurement.
HMMs do this modeling by saying there are a number of "hidden states", where each observation belongs to a hidden state. But we don't know which state each observation belongs to - and this is what the HMM will give us, albeit in terms of maximum likelihood.
In our case we can say there are 2 states: {STAFF LINE, NO STAFF LINE} but in reality I used 4 states: {STAFF, NOT_STAFF, NOT_STAFF_THIN, NOTHING}.
Next up is training the HMM. This entails obtaining data for two matrices: State transition matrix, Emission matrix.
The State transition matrix models the probabilities of going from state A to state B:
Screen Shot 2015-02-05 at 3.21.48 PM
See that the states mostly like to stay within themselves (i.e. the diagonal on the affinity matrix is close to 1).

The Emission probability matrix models the probability of emitting symbol S from state A:
Screen Shot 2015-02-05 at 4.06.22 PM
See how for each state there are typical emissions that have high probabilities of occuring (for "NO" that is emission type "1", for example).
These matrices can easily be calculated directly if you have an annotated example. Simply count the number of transitions from one state to the other, and count the number of occurrences of a symbol when a certain state is presented.

Running the HMM

I've used a CPP implementation of an HMM called HMMLib: http://www.cs.au.dk/~asand/?page_id=152, they also have a paper detailing their efforts to optimize the calculation.

The integration was rather simple. We traverse the lines (using OpenCV's very useful LineIterator) and collect the observations, and then run the Viterbi algorithm.

for(int line = 0; line < staffLines.size(); line++) {
     Point a = staffLines[line].first;
     Point b = staffLines[line].second;

     LineIterator it(gray, a, b, 8);
     sequence obs;
     for(int l = 0; l < it.count; l++, ++it) {
          Point onLine = it.pos();
          Point2f onLine2f(onLine.x,onLine.y);

          // Look up 7 pixels and count the black ones
          uchar countUp = countNonZero(gray(Rect(onLine2f - Point2f(0,7),Size(1,7))));
          // Look down 7 pixels and count the black ones
          uchar countDown = countNonZero(gray(Rect(onLine2f,Size(1,7))));
          // Combine both counts to 4 bits (16 states)
          uchar current_emission = (((countUp >> 1) << 2) & 12) | ((countDown >> 1) & 3);


     int n = obs.size();
     sequence hiddenseq(n);

     // Running the Viterbi algorithm will come up with the best sequence of hidden-states 
     // that match the observation
     loglik = m_hmm->viterbi(obs, hiddenseq); 

     // Do something with hiddenseq... (like traversing the line again and removing the pixels
     // that belong to state STAFF (state 0 in this case).

The results were pretty good:

From left to right: Original binarized images, staff lines annotated by HMM,  staff lines removed.

From left to right: Original binarized images, staff lines annotated by HMM, staff lines removed.

OK thats what I have for you today.