When producing H.264 frames and decoding them using pyAV, packets are parsed from frames only when invoking the parse
methods twice.
Consider the following test H.264 input, created using:
ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -f image2 -vcodec libx264 -bsf h264_mp4toannexb -force_key_frames source -x264-params keyint=1:scenecut=0 "frame-M.h264"
Now, using pyAV to parse the first frame:
import av
codec = av.CodecContext.create('h264', 'r')
with open('/path/to/frame-0001.h264', 'rb') as file_handler:
chunk = file_handler.read()
packets = codec.parse(chunk) # This line needs to be invoked twice to parse packets
packets remain empty unless the last line is invoked again (packets = codec.parse(chunk)
)
Also, for different real life examples I cannot characterize, it seems that decoding frames from packets also require several decode invocations:
packet = packets[0]
frames = codec.decode(packet) # This line needs to be invoked 2-3 times to actually receive frames.
Does anyone know anything about this incosistent behavior of pyAV?
(Using Python 3.8.12 on macOS Monterey 12.3.1, ffmpeg 4.4.1, pyAV 9.0.2)
CodePudding user response:
This is an expected PyAV behavior. Not only, it is an expected behavior of the underlying libav
. One packet does not guarantee a frame, and multiple packets may be needed before producing a frame. This is apparent in FFmpeg's video decoder example:
while (ret >= 0) {
ret = avcodec_receive_frame(dec_ctx, frame);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
return;
If it needs more packets to form a frame, it throws the EAGAIN
error.
[edit]
Actually, the above example is not a good example as it just exits on EAGAIN
. To retrieve a frame, it should rather continue
on EAGAIN
:
while (ret >= 0) {
ret = avcodec_receive_frame(dec_ctx, frame);
if (AVERROR(EAGAIN))
continue;
if (ret == AVERROR_EOF)
return;
[edit]
pyav's codec.parse()
The decoding sometimes needing additional calls is a fairly well-known fact, but the parser needing to flush is less common. Here is the difference between PyAV and FFmpeg:
PyAV parses the input data with av_parser_parse2()
like this [ref]:
while True:
with nogil:
consumed = lib.av_parser_parse2(
self.parser,
self.ptr,
&out_data, &out_size,
in_data, in_size,
lib.AV_NOPTS_VALUE, lib.AV_NOPTS_VALUE,
0
)
err_check(consumed)
# ...snip...
if not in_size:
# This was a flush. Only one packet should ever be returned.
break
in_data = consumed
in_size -= consumed
if not in_size:
# Aaaand now we're done.
break
So it reads until the input data is 100% consumed and note that it does not call av_parser_parse2
at end of buffer (which makes sense as the input data may be only a part of the stream data.
In contrast, FFmpeg does not call av_parser_parse2
directly and uses parse_packet
and you can see how it handles the similar situation:
while (size > 0 || (flush && got_output)) {
int64_t next_pts = pkt->pts;
int64_t next_dts = pkt->dts;
int len;
len = av_parser_parse2(sti->parser, sti->avctx,
&out_pkt->data, &out_pkt->size, data, size,
pkt->pts, pkt->dts, pkt->pos);
It calls av_parser_parse2
also to flush the stream after input data stream is exhausted. So, you need to do the same in PyAV: after all your frames are read, call codec.parse()
one last time to flush the last packet.