Opened 8 years ago

Last modified 10 months ago

#5732 open defect

Display corruption on very high-bitrate H.264 files

Reported by: Sesse Owned by:
Priority: minor Component: avcodec
Version: git-master Keywords: h264
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Hi,

I've discovered what I believe is a bug in the H.264 decoder of libavcodec. It concerns the following file, where the video is encoded using Intel Quick Sync Video (on a Haswell, via VA-API) at constant quantizer:

http://storage.sesse.net/through-the-cracks.mp4

Unfortunately, the file is very big (~5.1GB), and attempts to cut it using ffmpeg(1) resulted in something VLC wouldn't play, so I've left it alone save for remuxing (it was originally in NUT) and audio reencoding.

The corruption happens around 13:50, in the right-hand side of the picture. You can see it by decoding using ffmpeg(1):

ffmpeg -ss 13:50 -i /srv/storage.sesse.net/through-the-cracks.mp4 -vframes 50 out-%03d.png

and then looking at out-*.png. The errors persist from out-001.png to out-019.png; they disappear at out-020.png (perhaps new keyframe?) and come back at out-045.png. It looks like some kind of overflow to me, probably due to the extreme bitrate chosen (around 170 Mbit/sec; this content is super-hard to encode!).

The file plays perfectly in VLC if and only if I enable VA-API hardware acceleration, so that it's decoded in hardware instead of by libavcodec's H.264 decoder.

Change History (23)

comment:1 by jkqxz, 8 years ago

I haven't downloaded all of your 8GB file to test this, but it's probably an overflow of the fixed-size per-frame output buffer. Can you provide the command used for encoding?

In any case, hopefully this is fixed in <https://git.libav.org/?p=libav.git;a=commit;h=8a62d2c28fbacd1ae20c35887a1eecba2be14371> (merge pending).

comment:2 by Sesse, 8 years ago

Your patch is about ffmpeg VA-API encode, which wasn't used in the encoding of this file. The video isn't encoded with libavcodec at all; it was encoded by Nageru (https://nageru.sesse.net/) from a live input.

comment:3 by jkqxz, 8 years ago

Oh, sorry. (Perhaps edit your description not to say "the video is encoded using Intel Quick Sync Video (on a Haswell, via VA-API)".)

comment:4 by Sesse, 8 years ago

Why shouldn't it say so? It's the truth. :-) (I wasn't even aware libavcodec could use QSV via VA-API.)

comment:5 by jkqxz, 8 years ago

Heh, ok. (It confused me, at least.)

I think you need to cut the file down somehow to something more sensible. Does the cut file still get the same results in ffmpeg, even if it isn't playable in VLC? Also, you can use the VAAPI decode in ffmpeg itself (with -hwaccel vaapi) - does that give the expected output as with VLC?

comment:6 by Sesse, 8 years ago

I made another cut without the audio; it seems to fare better.

http://storage.sesse.net/through-the-cracks.mp4

This yields corrupted display:

ffmpeg -i through-the-cracks-cut.mp4 -vframes 50 out-%03d.png

This yields correct display:

ffmpeg -hwaccel vaapi -i through-the-cracks-cut.mp4 -vframes 50 out-%03d.png
Last edited 8 years ago by Sesse (previous) (diff)

comment:7 by Sesse, 8 years ago

For reference; the correct decode is

http://home.samfundet.no/~sesse/out-012.png

and the broken one is

http://home.samfundet.no/~sesse/out-012-broken.png

comment:8 by jkqxz, 8 years ago

How about

ffmpeg -ss 13:50 -i input.mp4 -an -c:v copy -bsf:v h264_mp4toannexb -frames 1000 output.h264

to make a smaller file?

comment:9 by Sesse, 8 years ago

Sorry, I gave you the wrong URL for the cut file; it should be http://storage.sesse.net/through-the-cracks-cut.mp4 . It is 156 MB (your command line produces a 456 MB file).

comment:10 by jkqxz, 8 years ago

Ok, with that stream I have exactly the same behaviour.

The reference decoder says:

----------------------------- JM 19.0 (FRExt) -----------------------------
 Input reference file                   : test_rec.yuv does not exist 
                                          SNR values are not available
Warning: zero_byte shall exist
POC must = frame# or field# for SNRs to be correct
--------------------------------------------------------------------------
  Frame          POC  Pic#   QP    SnrY     SnrU     SnrV   Y:U:V Time(ms)
--------------------------------------------------------------------------
00000(IDR)        0     0    15                             4:2:0    1232
00001( P )        3     1    15                             4:2:0    1232
00000( b )        1     2    15                             4:2:0    1204
00001( b )        2     2    15                             4:2:0    1209
00003( P )        6     2    15                             4:2:0    1228
00002( b )        4     3    15                             4:2:0    1215
00002( b )        5     3    15                             4:2:0    1256
WARNING! Vertical motion vector 2076 is out of allowed range {-2048, 2047} in picture 0, macroblock 53
WARNING! Vertical motion vector 2111 is out of allowed range {-2048, 2047} in picture 0, macroblock 53
WARNING! Vertical motion vector 2060 is out of allowed range {-2048, 2047} in picture 0, macroblock 53
WARNING! Vertical motion vector 2130 is out of allowed range {-2048, 2047} in picture 0, macroblock 53
WARNING! Vertical motion vector 6167 is out of allowed range {-2048, 2047} in picture 0, macroblock 53
WARNING! Vertical motion vector 2049 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 2053 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 6203 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 6184 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 6176 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 10235 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 10316 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 10300 is out of allowed range {-2048, 2047} in picture 0, macroblock 54
WARNING! Vertical motion vector 10379 is out of allowed range {-2048, 2047} in picture 0, macroblock 55
WARNING! Vertical motion vector 10291 is out of allowed range {-2048, 2047} in picture 0, macroblock 55
WARNING! Vertical motion vector 14398 is out of allowed range {-2048, 2047} in picture 0, macroblock 55
WARNING! Vertical motion vector 14469 is out of allowed range {-2048, 2047} in picture 0, macroblock 55

(... and more)

Looking at the trace output for that macroblock and the following one, we see that it really is getting huge vectors there:

*********** POC: 9 (I/P) MB: 53 Slice: 0 Type 0 **********
@6330520 mb_skip_flag                                                    (  1)
@6330521 mb_type                                                         (  4)
@6330522 sub_mb_type                                                     (  1)
@6330523 sub_mb_type                                                     (  2)
@6330524 sub_mb_type                                                     (  1)
@6330525 sub_mb_type                                                     (  1)
@6330526 mvd_l0                                                          (-52)
@6330527 mvd_l0                                                          ( 22)
@6330528 mvd_l0                                                          (-14)
@6330529 mvd_l0                                                          ( 74)
@6330530 mvd_l0                                                          (  4)
@6330531 mvd_l0                                                          (-10)
@6330532 mvd_l0                                                          (-12)
@6330533 mvd_l0                                                          ( 29)
@6330534 mvd_l0                                                          ( -6)
@6330535 mvd_l0                                                          ( 97)
@6330536 mvd_l0                                                          ( 81)
@6330537 mvd_l0                                                          ( 60)
@6330538 mvd_l0                                                          ( 69)
@6330539 mvd_l0                                                          ( 54)
@6330540 mvd_l0                                                          (-48)
@6330541 mvd_l0                                                          (4056)
@6330542 coded_block_pattern                                             ( 31)
@6330543 mb_qp_delta                                                     (  0)
*********** POC: 9 (I/P) MB: 54 Slice: 0 Type 0 **********
@6330801 mb_skip_flag                                                    (  1)
@6330802 mb_type                                                         (  4)
@6330803 sub_mb_type                                                     (  1)
@6330804 sub_mb_type                                                     (  1)
@6330805 sub_mb_type                                                     (  2)
@6330806 sub_mb_type                                                     (  1)
@6330807 mvd_l0                                                          (  0)
@6330808 mvd_l0                                                          (  6)
@6330809 mvd_l0                                                          (-44)
@6330810 mvd_l0                                                          ( 10)
@6330811 mvd_l0                                                          (  0)
@6330812 mvd_l0                                                          (4154)
@6330813 mvd_l0                                                          (-52)
@6330814 mvd_l0                                                          (4131)
@6330815 mvd_l0                                                          (-15)
@6330816 mvd_l0                                                          (4123)
@6330817 mvd_l0                                                          (111)
@6330818 mvd_l0                                                          (4059)
@6330819 mvd_l0                                                          (-17)
@6330820 mvd_l0                                                          (4132)
@6330821 mvd_l0                                                          (-47)
@6330822 mvd_l0                                                          ( 65)
@6330823 coded_block_pattern                                             ( 31)
@6330824 mb_qp_delta                                                     (  0)

The [-2048,2047] is in qpels, so the ~2000 numbers are just about plausible as ~500 pixels in the 1280x720 stream (but still outside the level limits). However, the following vectors over 6000 qpels are just wrong (larger than the frame), suggesting an encoder bug. For the VAAPI decode to work, presumably the Intel hardware encoder and decoder use the same logic to interpret them and therefore get the same answer.

Not sure what to do with that result. Can you try a more recent version of the i965 VA driver on the encode in case this is a bug there that has been fixed?

comment:11 by Carl Eugen Hoyos, 8 years ago

Keywords: h264 added
Priority: normalminor
Reproduced by developer: set
Status: newopen

I have uploaded a smaller sample to http://samples.ffmpeg.org/ffmpeg-bugs/trac/ticket5732/
Please decode with the Intel decoder and provide the output, for example with:

$ ffmpeg -hwaccel vaapi -vsync 0 -i through-the-cracks-cut.h264 -vcodec ffv1 out.nut

in reply to:  description comment:12 by Carl Eugen Hoyos, 8 years ago

Replying to Sesse:

this content is super-hard to encode!

Not related to this ticket:
Did you try to encode with x264? I am very curious...

comment:13 by Sesse, 8 years ago

Let's see. I can't do any more tests easily, unfortunately; this was part of a rig for an event, and I don't have access to comparable hardware to produce the input signal anymore. It might very well be that the QSV-generated stream is somehow out-of-spec here; as a layperson, it's impossible for me to say. As for i965-va-driver versions, we stopped upgrading stuff a few months before the event; just locked the versions of everything and tested it intensely to avoid nasty surprises from new versions. :-) I see from a backup that we used version 1.7.0; looks like 1.7.1 is the latest, but at least it's not stone age.

I ran your decode command-line. I couldn't really play out.nut in anything, it looks more like a still image to me. But I've uploaded it to http://storage.sesse.net/out.nut in case you can get something usable out of it.

As for x264's performance on this clip, Nageru makes one in parallel to the QSV stream. (The QSV stream is meant for archival, the x264 stream is meant for direct consumption by clients.) At 5 Mbit/sec on a 4x3.6 GHz Haswell (we run with speedcontrol, which on this machine largely means hovering around somewhere between the equivalent of “slow” and “slower” presets), it was much better than the YouTube version (https://www.youtube.com/watch?v=-N5CLcSkkWs), but far from perfect. I would say x264 did a respectable job, but not a fantastic one.

in reply to:  13 comment:14 by Carl Eugen Hoyos, 8 years ago

Replying to Sesse:

I ran your decode command-line. I couldn't really play out.nut in anything, it looks more like a still image to me. But I've uploaded it to http://storage.sesse.net/out.nut in case you can get something usable out of it.

Works fine here with MPlayer and FFmpeg, thank you!

As for x264's performance on this clip, Nageru makes one in parallel to the QSV stream. (The QSV stream is meant for archival, the x264 stream is meant for direct consumption by clients.) At 5 Mbit/sec on a 4x3.6 GHz Haswell (we run with speedcontrol, which on this machine largely means hovering around somewhere between the equivalent of “slow” and “slower” presets), it was much better than the YouTube version (https://www.youtube.com/watch?v=-N5CLcSkkWs), but far from perfect. I would say x264 did a respectable job, but not a fantastic one.

Am I correct that the h264 stream you uploaded is 25Mbit/sec?

Just for reference: The input stream is decoded correctly by current FFmpeg as confirmed by the reference decoder, a specific bug workaround for the encoder would make sense.

comment:15 by Sesse, 8 years ago

Well, yes and no. The stream is encoded at constant quantizer 15, which for the content I'm interested in is as good as visually lossless. Normally, for me, this does indeed yield streams around 25 Mbit/sec, but this specific demo is an outlier (I like to call these “streamkillers”; I guess you can understand why), and I've seen it spike to 170 Mbit/sec or so in parts.

I understand that the error is probably on the encoding side, which is a bit of a bummer. I wonder if I can somehow reproduce it without recreating the lossless input (perhaps just take the input and reencode at similar quantizer?). Of course, this sounds more like a hardware bug than anything else, then, and those are by nature hard to fix :-) It would be interesting to see if e.g. Skylake has the same issue, though; AIUI, they keep improving the QSV encoder.

comment:16 by Sesse, 8 years ago

Yes indeed, this reproduces it on my laptop (also Haswell):

ffmpeg -hwaccel vaapi -ss 12:00 -vaapi_device :0 -i through-the-cracks.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi test.mp4

It creates a stream (default quantizer 20, I believe) that works perfectly if I view it in VLC with vaapi on, but if I set LIBVA_DRIVER_NAME=none (effectively sabotaging vaapi, as it looks for a “none” driver), it gets completely broken.

This is with va-i965-driver 0.7.1.

in reply to:  15 comment:17 by Carl Eugen Hoyos, 8 years ago

Replying to Sesse:

Well, yes and no. The stream is encoded at constant quantizer 15, which for the content I'm interested in is as good as visually lossless. Normally, for me, this does indeed yield streams around 25 Mbit/sec, but this specific demo is an outlier (I like to call these “streamkillers”; I guess you can understand why), and I've seen it spike to 170 Mbit/sec or so in parts.

The - imo - interesting question is now: What does x264 produce with a constant quantiser of 15 (or lower)?

I cannot really comment on your other questions: On this bug tracker (and the mailing lists), users were often warned (by different people) that hardware encoders (once they would be supported) will disappoint everybody in every regard. I never had a strong opinion on this, I don't encode much.
But preferring a very new encoder "for archiving" over one that was most likely tested more than anything similar surprised me...

comment:18 by Sesse, 8 years ago

It's all about where you want to allocate your CPU power. One one of the machines I run this on, the mixing process takes up literally all my CPU, so there's simply none left for x264 (maybe 10% of one 800 MHz core); it has to go to Quick Sync or else I would simply get no stream. :-) In the case of the 4x3.6 GHz Haswell, any CPU I don't use will go into making the 5 Mbit stream better; I could of course try to spend a core or so on running with low CRF on ultrafast, but that means one core less for the “real” stream that users will see.

Of course, x264 is much better quality-wise (bit-for-bit) than any hardware encoder I've ever seen. I'm not sure if I understand your question, though; do you ask if it produces a smaller bitstream at e.g. medium preset (I would be surprised if it didn't), or just if it produces a valid one at all?

in reply to:  18 comment:19 by Carl Eugen Hoyos, 8 years ago

Replying to Sesse:

Of course, x264 is much better quality-wise (bit-for-bit) than any hardware encoder I've ever seen. I'm not sure if I understand your question, though; do you ask if it produces a smaller bitstream at e.g. medium preset (I would be surprised if it didn't), or just if it produces a valid one at all?

I would be surprised if it doesn't produce a valid output. I wonder if the output file isn't (a magnitude?) smaller for any preset and the same output quality.

comment:20 by Sesse, 8 years ago

I compiled x264 from git as today, and compiled ffmpeg against it. I then ran

ffmpeg -hwaccel vaapi -ss 12:00 -vaapi_device :0 -i through-the-cracks.mp4 -c:v libx264 -qp 20 -preset medium test2.mp4

which hopefully should be comparable in quality.

The file produced was actually _larger_ than the QSV one (1892 MB for QSV, 2105 MB for x264). And x264 is of course much, much slower; on my dual-core Haswell i7 laptop, eating all the CPU, overall speed was 0.201x realtime (and there are some easy sections in the beginning). For comparison, QSV on the same machine was 2.16x realtime.

in reply to:  20 comment:21 by Carl Eugen Hoyos, 8 years ago

Replying to Sesse:

which hopefully should be comparable in quality.

I don't think this is generally true but it doesn't really matter: I was just curious if you tried other things.

comment:22 by Sesse, 8 years ago

I've filed https://bugs.freedesktop.org/show_bug.cgi?id=97070 on the encoder side. I guess a workaround on the decoder might still be interesting.

comment:23 by Balling, 23 months ago

You can play ffv1 sample out.nut using -an -noframedrop in ffplay. Sigh.

The vertical motion vector component range for level 4.1 does not exceed -512, +511.75 in units of luma frame samples, which is equal -2048, +2047 in 1/4 units. So that bitstream is effing invalid. In fact the only other option of -8192, 8191.75 which is -32768, +32767. And it does contain stuff as high as 30796... So... Apparently the correct profile is 6.0.

Last edited 10 months ago by Balling (previous) (diff)
Note: See TracTickets for help on using tickets.