No Sound When Playing Back PCM-Decoded Audio-CodePudding

I am reading AAC audio frames which I then decode to PCM with Media Foundation and am trying to play back through WASAPI. Particularly 48000khz 2 channels, 16 bit. I am able to decode the frames, write them to a file full.pcm, and then open and play that PCM file successfully in Audacity. However, my code to play back through the device speakers gives me nothing. The source I am trying to play through is the default source, which is my DAC. I am not getting any bad HRESULTS from any of the WASAPI-related code, so I'm confused. WASAPI is new to me though, so maybe there is something obvious I am missing.

#include "AudioDecoder.h"
#include <vector>
#include <__msvc_chrono.hpp>
#include <string>
#include <fstream>
#include <cassert>
#include <filesystem>

#include <mmdeviceapi.h>
#include <endpointvolume.h>
#include <functiondiscoverykeys.h> 
#include <audioclient.h>

int fps_counter = 0;
int frame_index = 0;

IAudioClient* audio_client;
IAudioRenderClient* render_client = nullptr;

int setup_audio_playback()
{
    HRESULT hr = S_OK;

    IMMDeviceEnumerator* pEnumerator = nullptr;
    IMMDevice* pDevice = nullptr;

    ATLENSURE_SUCCEEDED(CoCreateInstance(__uuidof(MMDeviceEnumerator), nullptr, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&pEnumerator));

    ATLENSURE_SUCCEEDED(pEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &pDevice));

    IPropertyStore* ips;
    ATLENSURE_SUCCEEDED(pDevice->OpenPropertyStore(STGM_READ, &ips));

    PROPVARIANT varName;
    // Initialize container for property value.
    PropVariantInit(&varName);
    ATLENSURE_SUCCEEDED(ips->GetValue(PKEY_Device_FriendlyName, &varName));

    std::wcout << L"Device name: " << varName.pwszVal << std::endl;

    ATLENSURE_SUCCEEDED(pDevice->Activate(__uuidof(IAudioClient), CLSCTX_ALL, nullptr, (void**)&audio_client));

    WAVEFORMATEX* format;
    ATLENSURE_SUCCEEDED(audio_client->GetMixFormat(&format));

    ATLENSURE_SUCCEEDED(audio_client->Initialize(AUDCLNT_SHAREMODE_SHARED, 0, 10000000, 0, format, NULL));

    uint32_t bufferFrameCount;
    ATLENSURE_SUCCEEDED(audio_client->GetBufferSize(&bufferFrameCount));

    ATLENSURE_SUCCEEDED(audio_client->GetService(__uuidof(IAudioRenderClient), (void**)&render_client));

    ATLENSURE_SUCCEEDED(audio_client->Start());

    return hr;
}

int main()
{
    HRESULT hr = S_OK;

    std::ofstream fout_all_frames_pcm;

    std::filesystem::remove(std::filesystem::current_path() / "full.pcm");

    fout_all_frames_pcm.open("full.pcm", std::ios::binary | std::ios::out);

    if (FAILED(hr = CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED)))
        return hr;
    if (FAILED(hr = MFStartup(MF_VERSION)))
        return hr;

    setup_audio_playback();

    AudioDecoder* ad = new AudioDecoder();

    std::vector<uint8_t> data;

    while (true)
    {
        std::chrono::time_point<std::chrono::steady_clock> iteration_time = std::chrono::high_resolution_clock::now();

        // Read frame data
        std::ifstream fin("Encoded Audio Frames\\frame"   std::to_string(frame_index)   ".aac", std::ios::binary | std::ios::in);

        if (fin.fail())
        {
            //throw std::runtime_error("Invalid file path specified");
            break;
        }

        // Get file length
        fin.seekg(0, std::ios::end);
        size_t const length = fin.tellg();
        fin.seekg(0, std::ios::beg);

        if (length > data.size())
        {
            static size_t constexpr const granularity = 64 << 10;
            data.resize((length   (granularity - 1)) & ~(granularity - 1));
            assert(length <= data.size());
        }

        // Copy frame data from file to array;
        fin.read(reinterpret_cast<char*>(data.data()), length);
        fin.close();

        CComPtr<IMFSample> pcm_sample;
        while (!ad->decode_sync(data.data(), length, &pcm_sample))
        {
            if (pcm_sample == nullptr) // This will happen if the color converter isn't able to produce output, so we will continue in that case
                continue;

            CComPtr<IMFMediaBuffer> buffer;
            if (FAILED(hr = pcm_sample->ConvertToContiguousBuffer(&buffer)))
                return hr;

            unsigned char* datas;
            DWORD length;
            if (FAILED(hr = buffer->GetCurrentLength(&length)))
                return hr;

            if (FAILED(hr = buffer->Lock(&datas, nullptr, &length)))
                return hr;

            fout_all_frames_pcm.write((char*)datas, length);

            // Does nothing
            //Sleep(120);

            // Grab all the available space in the shared buffer.
            uint8_t* pData;
            ATLENSURE_SUCCEEDED(render_client->GetBuffer(1, &pData));

            memcpy(pData, datas, length);

            DWORD flags = 0;
            ATLENSURE_SUCCEEDED(render_client->ReleaseBuffer(1, flags));

            pcm_sample.Release();
        }

        frame_index  ;
    }

    audio_client->Stop();

    return 0;
}

CodePudding user response：

Doing

render_client->GetBuffer(1, ...

will not give you any stable behavior because you are trying to submit data sample by sample. Literally, one PCM sample of your 48000 samples per second. Of course, the code is likely to be broken more than this because you seem to be simply losing most of the data getting much more from decoder and feeding just one sample to the device.

You would want to check this article in the part where the code identifies how many samples the GetBuffer will carry and then loop with filling those buffers accurately until you consume your IMFsample data.

How large those buffers are, those you obtain with GetBuffer? For 10 ms buffers which are pretty typical and 48 kHz sampling rate, you would have 480 samples per buffer. With stereo and 16-bit PCM you have four bytes per sample and so you would be delivering around 2K bytes every GetBuffer/ReleaseBuffer iteration.