Why is ffmpeg faster than this minimal example?-CodePudding

I'm wanting to read the audio out of a video file as fast as possible, using the libav libraries. It's all working fine, but it seems like it could be faster.

To get a performance baseline, I ran this ffmpeg command and timed it:

time ffmpeg -threads 1 -i file -map 0:a:0 -f null -

On a test file (a 2.5gb 2hr .MOV with pcm_s16be audio) this comes out to about 1.35 seconds on my M1 Macbook Pro.

On the other hand, this minimal C code (based on FFmpeg's "Demuxing and decoding" example) is consistently around 0.3 seconds slower.

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>

static int decode_packet(AVCodecContext *dec, const AVPacket *pkt, AVFrame *frame)
{
    int ret = 0;

    // submit the packet to the decoder
    ret = avcodec_send_packet(dec, pkt);

    // get all the available frames from the decoder
    while (ret >= 0) {
        ret = avcodec_receive_frame(dec, frame);
        av_frame_unref(frame);
    }

    return 0;
}

int main (int argc, char **argv)
{
    int ret = 0;
    AVFormatContext *fmt_ctx = NULL;
    AVCodecContext  *dec_ctx = NULL;
    AVFrame *frame = NULL;
    AVPacket *pkt = NULL;

    if (argc != 3) {
        exit(1);
    }

    int stream_idx = atoi(argv[2]);

    /* open input file, and allocate format context */
    avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);

    /* get the stream */
    AVStream *st = fmt_ctx->streams[stream_idx];

    /* find a decoder for the stream */
    AVCodec *dec = avcodec_find_decoder(st->codecpar->codec_id);

    /* allocate a codec context for the decoder */
    dec_ctx = avcodec_alloc_context3(dec);

    /* copy codec parameters from input stream to output codec context */
    avcodec_parameters_to_context(dec_ctx, st->codecpar);

    /* init the decoder */
    avcodec_open2(dec_ctx, dec, NULL);

    /* allocate frame and packet structs */
    frame = av_frame_alloc();
    pkt = av_packet_alloc();

    /* read frames from the specified stream */
    while (av_read_frame(fmt_ctx, pkt) >= 0) {
        if (pkt->stream_index == stream_idx)
            ret = decode_packet(dec_ctx, pkt, frame);

        av_packet_unref(pkt);
        if (ret < 0)
            break;
    }

    /* flush the decoders */
    decode_packet(dec_ctx, NULL, frame);

    return ret < 0;
}

I tried measuring parts of this program to see if it was spending a lot of time in the setup, but it's not – at least 1.5 seconds of the runtime is the loop where it's reading frames.

So I took some flamegraph recordings (using

The interesting thing that stands out to me is how long is spent on read in the minimal example vs. ffmpeg:

The time spent on lseek is also a lot longer in the minimal program – it's plainly visible in that flamegraph, but in the ffmpeg flamegraph, lseek is a single pixel wide.

What's causing this discrepancy? Is ffmpeg actually doing less work than I think it is here? Is the minimal code doing something naive? Is there some buffering or other I/O optimizations that ffmpeg has enabled?

How can I shave 0.3 seconds off of the minimal example's runtime?

CodePudding user response：

The difference is that ffmpeg, when run with the -map flag, is explicitly setting the AVDISCARD_ALL flag on the streams that were going to be ignored. The packets for those streams still get read from disk, but with this flag set, they never make it into av_read_frame (with the mov demuxer, at least).

In the example code, by contrast, this while loop receives every packet from every stream, and only drops the packets after they've been (wastefully) passed through av_read_frame.

/* read frames from the specified stream */
while (av_read_frame(fmt_ctx, pkt) >= 0) {
    if (pkt->stream_index == stream_idx)
        ret = decode_packet(dec_ctx, pkt, frame);

    av_packet_unref(pkt);
    if (ret < 0)
        break;
}

I changed the program to set the discard flag on the unused streams:

// ...

/* open input file, and allocate format context */
avformat_open_input(&fmt_ctx, argv[1], NULL, NULL);

/* get the stream */
AVStream *st = fmt_ctx->streams[stream_idx];

/* discard packets from other streams */
for(int i = 0; i < fmt_ctx->nb_streams; i  ) {
  fmt_ctx->streams[i]->discard = AVDISCARD_ALL;
}
st->discard = AVDISCARD_DEFAULT;

// ...

With that change in place, it gives about a ~1.8x speedup on the same test file, after the cache is warmed up.

Minimal example, without discard   1.593s
ffmpeg with -map 0:a:0             1.404s
Minimal example, with discard      0.898s