Home > Blockchain >  Audio samples to musical note detection issue
Audio samples to musical note detection issue

Time:03-14

I'm trying to setup a pipeline allowing me to detect musical notes from audio samples, but the input layer where I identify the frequency content of the samples does not land on the expected values. In the example below I...

  • build what I expect to be a 440Hz (A4) sine wave in the FFTW input buffer
  • apply the Hamming window function
  • lookup the first half the output bins to find the 4 top values and their frequency
void GenerateSinWave(fftw_complex* outputArray, int N, double frequency, double samplingRate)
{
    double sampleDurationSeconds = 1.0 / samplingRate;
    for (int i = 0; i < N;   i)
    {
        double sampleTime = i * sampleDurationSeconds;
        outputArray[i][0] = sin(M_2_PI * frequency * sampleTime);
    }
}

void HammingWindow(fftw_complex* array, int N)
{
    static const double a0 = 25.0 / 46.0;
    static const double a1 = 1 - a0;
    for (int i = 0; i < N;   i)
        array[i][0] *= a0 - a1 * cos((M_2_PI * i) / N);
}
int main()
{
    const int N = 4096;
    double samplingRate = 44100;
    double A4Frequency = 440;
    fftw_complex in[N] = { 0 };
    fftw_complex out[N] = { 0 };
    fftw_plan plan = fftw_plan_dft_1d(N, 0, 0, FFTW_FORWARD, FFTW_ESTIMATE);

    GenerateSinWave(in, N, A4Frequency, samplingRate);
    HammingWindow(in, N);
    fftw_execute_dft(plan, in, out);

    // Find the 4 top values
    double binHzRange = samplingRate / N;
    for (int i = 0; i < 4;   i)
    {
        double maxValue = 0;
        int maxBin = 0;
        for (int bin = 0; bin < (N/2);   bin)
        {
            if (out[bin][0] > maxValue)
            {
                maxValue = out[bin][0];
                maxBin = bin;
            }
        }
        out[maxBin][0] = 0; // remove value for next pass
        double binMidFreq = (maxBin * binHzRange)   (binHzRange / 2);
        std::cout << (i   1) << " -> Freq: " << binMidFreq << " Hz - Value: " << maxValue << "\n";
    }
    fftw_destroy_plan(plan);
}

I was expecting something close to 440 or lower/higher harmonics, however the results are far from that:

1 -> Freq: 48.4497Hz - Value: 110.263
2 -> Freq: 59.2163Hz - Value: 19.2777
3 -> Freq: 69.9829Hz - Value: 5.68717
4 -> Freq: 80.7495Hz - Value: 2.97571

This flow is mostly inspired by this other SO answer. I feel that my lack of knowledge about signal processing might be in cause! My sin wave generation and window function seem to be ok, but audio analysis and FFTW are full of mysteries...

Any insight about how to improve my usage of FFTW, approach signal processing or simply write better code is appreciated!

EDIT: fixed integer division leading to Hamming a0 parameter always being 0. Results changed a little, but still far of the expected 440 Hz

CodePudding user response:

I think you've misunderstood the M_2_PI constant in your GenerateSinWave function. M_2_PI is defined as 2.0 / PI. You should be using 2 * M_PI instead.

This mistake will mean that your generated signal has a frequency of only around 45 Hz. This should be close to the output frequencies you are seeing.

The same constant needs correcting in your HammingWindow function too.

  • Related