Opened 6 years ago

Closed 6 years ago

#6775 closed defect (fixed)

gst-libav can no longer decode video (3.4 regression)

Reported by: Jan Alexander Steffens Owned by:
Priority: critical Component: avcodec
Version: git-master Keywords: regression
Cc: jcowgill+ffmpeg@jcowgill.uk, dominik@greysector.net, negativo17@gmail.com, Michael Niedermayer Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

gst-libav attempts to drain the decoder (by feeding avcodec_decode_video2 empty packets) before every discontinuity in the stream. The compat implementation used since 3.4 doesn't seem to handle this case correctly. It does not consume any data from the following (real) packets and further calls echo a message: "Got unexpected packet size after a partial decode"

Since the start of the stream is also considered a discontinuity, the decode can't even start. Avoiding draining at the start of the stream doesn't solve the problem for any further discontinuity.

Downstream bug: https://bugzilla.gnome.org/show_bug.cgi?id=789193

How to reproduce:

gst-launch-1.0 videotestsrc ! x264enc ! rtph264pay ! identity drop-probability=0.1 ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! autovideosink

This creates a h264 stream which is missing 10% of frames.

(1) With FFmpeg 3.3, it manages a best-effort (glitchy) decode.

(2) With FFmpeg 3.4, it does not start decoding at all.

(3) Reverting gst-libav commit 67e55e47 (or inserting a call to avcodec_flush_buffers after draining) allows decoding to start, but causes the decoder state to be cleared after each missing frame, creating much worse output than (1).

Change History (17)

comment:1 by Jan Alexander Steffens, 6 years ago

Seems to be related to #6446.

comment:2 by James Cowgill, 6 years ago

Cc: jcowgill+ffmpeg@jcowgill.uk added

comment:3 by Carl Eugen Hoyos, 6 years ago

Please do a bisect to find the FFmpeg change that introduced the issue (and if possible please verify if the same change broke Kodi).

comment:4 by Carl Eugen Hoyos, 6 years ago

And please confirm that the issue is reproducible with current FFmpeg git head.

comment:5 by Carl Eugen Hoyos, 6 years ago

Priority: normalcritical

comment:6 by Jan Alexander Steffens, 6 years ago

The first bad commit is bddb2343b6e594e312dadb5d21b408702929ae04 (which merges 061a0c14bb "decode: restructure the core decoding code").

Couldn't test master as gst-libav does not build against it (yet? Problem seems to be renamed defines; would attempt a port if time allows).

comment:7 by Dominik Mierzejewski, 6 years ago

Cc: dominik@greysector.net added

comment:8 by Hendrik, 6 years ago

Requiring avcodec_flush_buffers after draining is an expected requirement in the decode API. There is no way to otherwise signal that the draining is done.

comment:9 by James Cowgill, 6 years ago

I think Kodi 17 is also affected by this (for people who don't use Kodi's bundled ffmpeg).
See: https://bugs.gentoo.org/634802

I have this hack which forces the buffers to be flushed after a drain has completed. It allows gst-libav and kodi to work with ffmpeg 3.4, but I expect it has some side effect I haven't thought of :) It should have the same effect as doing (3) above.

--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -780,6 +780,10 @@ static int compat_decode(AVCodecContext *avctx, AVFrame *frame,
     while (ret >= 0) {
         ret = avcodec_receive_frame(avctx, frame);
         if (ret < 0) {
+            if (ret == AVERROR_EOF) {
+                av_assert0(avci->draining);
+                avcodec_flush_buffers(avctx);
+            }
             if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                 ret = 0;
             goto finish;

I think there is a misunderstanding here about what a NULL packet sent via avcodec_video_decode2 actually does, probably because the documentation is a bit lacking in this area. The documentation says you need a NULL packet at the end of a stream, but says nothing about NULL packets sent at any other time. There are at least a few comments I found in the ffmpeg source which imply that sending a NULL packet means EOF (as opposed to draining, and allowing extra data later). Clearly the developer who wrote the compat code in 3.4 thought this as well.

Maybe I'm missing something obvious, but why does gst-libav need to drain the codec when a packet is dropped?

comment:10 by Carl Eugen Hoyos, 6 years ago

Version: git-master

comment:11 by Carl Eugen Hoyos, 6 years ago

Please send your patch to the development mailing list for discussion.

comment:12 by Simone Caronni, 6 years ago

Cc: negativo17@gmail.com added

in reply to:  9 ; comment:13 by Carl Eugen Hoyos, 6 years ago

Replying to jcowgill:

I have this hack which forces the buffers to be flushed after a drain has completed. It allows gst-libav and kodi to work with ffmpeg 3.4, but I expect it has some side effect I haven't thought of :) It should have the same effect as doing (3) above.

Does the patch completely break error concealment?

in reply to:  9 comment:14 by Hendrik, 6 years ago

Replying to jcowgill:

I think there is a misunderstanding here about what a NULL packet sent via avcodec_video_decode2 actually does, probably because the documentation is a bit lacking in this area. The documentation says you need a NULL packet at the end of a stream, but says nothing about NULL packets sent at any other time. There are at least a few comments I found in the ffmpeg source which imply that sending a NULL packet means EOF (as opposed to draining, and allowing extra data later). Clearly the developer who wrote the compat code in 3.4 thought this as well.

The key there is that nowhere in the source or the docs it does say you can send a NULL packet at any other time then EOF. All mentions of NULL drain packets include a wording similar to "at the end" (of the stream) etc.

So while it may have worked somewhat in the past (there are reports of this usage actually crashing with some decoders, though), the documentation never actually told you that this was a valid call, or suggested to do so in any way.

"Fixing" this by flushing after every drain will likely cause further issues, since a flush also discards any reference frames and the likes.

Did we ever get an answer *why* draining all the time is being performed? I don't see the need for that in any circumstances.

Last edited 6 years ago by Hendrik (previous) (diff)

in reply to:  13 comment:15 by James Cowgill, 6 years ago

Replying to cehoyos:

Replying to jcowgill:

I have this hack which forces the buffers to be flushed after a drain has completed. It allows gst-libav and kodi to work with ffmpeg 3.4, but I expect it has some side effect I haven't thought of :) It should have the same effect as doing (3) above.

Does the patch completely break error concealment?

Yes in the case of gst-libav. It is however an improvement over the current situation where nothing works at all. I'm not intending for this to be a long term solution - I am hoping gst-libav will fix their code so workarounds don't have to be used.

comment:16 by Michael Niedermayer, 6 years ago

Cc: Michael Niedermayer added

comment:17 by Carl Eugen Hoyos, 6 years ago

Resolution: fixed
Status: newclosed

The work-around by James Cowgill was applied as 02ba4b91b5616ecbebee5c9565e1be7af2a6b980 and backported for 3.4.1.

I would like to add that I find it disturbing that - assuming this is a very shallow bug - both GStreamer and Kodi did not test FFmpeg for at least six months.

Note: See TracTickets for help on using tickets.