Touch up your sound with SoundTouch [w/code]

Screen Shot 2014-10-13 at 11.55.42 AM
So I needed to speed up / slow down an audio stream I had (speech generated with Flite TTS) and naively I thought it would suffice to simply sample it at the right intervals and interpolate.
I quickly discovered that just re-sampling won’t do because changing frequency also changes pitch proportionally. And then I discovered the world of Time Scaling in audio and it’s many algorithms and approaches to change the tempo without changing pitch.
To my surprise there were a number of ready made free libraries that do it, but the first one I tried – RubberBand – did not work out, it had too many dependencies I simply couldn’t be bothered compiling it for the Mac. But SoundTouch, well it had a Homebrew formula so it won by default.
I wrote a little simple wrapper around it, that interfaces nicely with Qt.
Let’s see what’s going on there

SoundTouch is fairly simple. I based my code off of their example SoundStretch example: http://svn.code.sf.net/p/soundtouch/code/trunk/source/SoundStretch/main.cpp
I ended up with a singleton SoundUtils class that does a very simple function – change the tempo or sample rate.

That’s pretty self explanatory, nothing fancy…
So here’s a small usage example using Flite TTS (it’s not gonna compile&run right away, only for illustration or copy/paste):

#include <QApplication>
#include "SoundUtils.h"
#include <flite/flite.h>
cst_voice *v;
int main(int argc, char *argv[]) {
    // generate speech with Flite TTS
    flite_init();
    v = register_cmu_us_rms(NULL);
    cst_wave* wave = flite_text_to_wave("hello sound touch", v);
    unregister_cmu_us_rms(v);//done, we got the samples
    // let's see what this wave looks like
    qDebug() << "wave info \n\tsamples " << cst_wave_num_samples(wave) <<
        "\n\tfreq " << cst_wave_sample_rate(wave) <<
        "\n\tchannels " << cst_wave_num_channels(wave) <<
        "\n\tsize of sample " << sizeof(typeof(*(wave->samples))) <<
        "\n\ttype" << wave->type <<
        "\n";
    //some info about the WAV
    int freq = cst_wave_sample_rate(wave);
    int numchannels = cst_wave_num_channels(wave);
    int samplesize_bytes = sizeof(typeof(*(wave->samples)));
    int samplesize_bits = samplesize_bytes * 8;
    short* buf = (short*)(cst_wave_samples(wave));
    int numsamples = cst_wave_num_samples(wave);
    int bufsize_bytes = numsamples * samplesize_bytes;
    //output the available device
    foreach (const QAudioDeviceInfo &deviceInfo, QAudioDeviceInfo::availableDevices(QAudio::AudioOutput))
        qDebug() << deviceInfo.deviceName();
    //pick the first device anyway... :)
    QAudioDeviceInfo m_device = QAudioDeviceInfo::availableDevices(QAudio::AudioOutput).first();
    QAudioFormat m_format;
    m_format.setFrequency(freq);
    m_format.setChannels(numchannels);
    m_format.setSampleSize(samplesize_bits);
    m_format.setCodec("audio/pcm");
    m_format.setByteOrder(QAudioFormat::LittleEndian);
    m_format.setSampleType(QAudioFormat::SignedInt);
    //If the audio format of the wav doesn't play nice with our device - we may need to change the sample rate
    QAudioDeviceInfo info(QAudioDeviceInfo::defaultOutputDevice());
    if (!info.isFormatSupported(m_format)) {
        qWarning() << "Default format not supported - trying to use nearest";
        m_format = info.nearestFormat(m_format);
        qDebug() << "freq " << m_format.frequency();
    }
    SoundUtils::Instance()->setup(freq,numchannels,10,m_format.frequency()); //10 percent speed increase
    QBuffer outBuf;
    outBuf.open(QIODevice::WriteOnly);
    //process the sound
    int new_numsamples = SoundUtils::Instance()->process(reinterpret_cast<const short*>(buf),numsamples,numchannels,samplesize_bytes, &outBuf);
    QBuffer wavebuf;
    //trim the start and end, which always carry some silence (using Flite TTS)
    SoundUtils::Instance()->trim(&outBuf,newsamples_num,&wavebuf);
    wavebuf.open(QIODevice::ReadOnly);
    QApplication a(argc, argv);
    QAudioOutput* m_audioOutput = new QAudioOutput(m_device, m_format, &a);
    //play the sound out
    m_audioOutput->start(&wavebuf);
    //block until done
    while(m_audioOutput->state() == QAudio::ActiveState)
        a.processEvents();
    return 0;

Leave a Reply