Opened 2 years ago

Last modified 12 months ago

#10668 open defect

cuvid regression creates jerky output

Reported by: Jason Dove Owned by:
Priority: important Component: avcodec
Version: git-master Keywords: cuvid
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

Using the h264_cuvid decoder with certain content will cause the output to be jerky.

How to reproduce:

% ffmpeg -c:v h264_cuvid -i input.mkv -c:v libx264 -y output.mkv
ffmpeg version N-112777-g08e97dae20 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13.2.1 (GCC) 20230801
  configuration: --prefix=/usr --extra-cflags=-I/opt/cuda/include --extra-ldflags=-L/opt/cuda/lib64 --enable-lto --disable-rpath --enable-gpl --enable-version3 --enable-nonfree --enable-shared --disable-static --disable-stripping --disable-htmlpages --enable-gray --enable-alsa --enable-avisynth --enable-bzlib --enable-chromaprint --enable-frei0r --enable-gcrypt --enable-gmp --enable-gnutls --enable-iconv --enable-ladspa --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcelt --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdavs2 --enable-libdc1394 --enable-libfdk-aac --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libglslang --enable-libgme --enable-libgsm --enable-libiec61883 --enable-libilbc --enable-libjack --enable-libjxl --enable-libklvanc --enable-libkvazaar --enable-liblensfun --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-libopencv --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopenvino --enable-libopus --enable-libplacebo --enable-libpulse --enable-librabbitmq --enable-librav1e --enable-librist --enable-librsvg --enable-librubberband --enable-librtmp --enable-libshine --enable-libsmbclient --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libsvthevc --enable-libsvtvp9 --disable-libtensorflow --enable-libtesseract --enable-libtheora --disable-libtls --enable-libtwolame --enable-libuavs3d --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxavs2 --enable-libxcb --enable-libxcb-shm --enable-libxcb-xfixes --enable-libxcb-shape --enable-libxvid --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-lzma --enable-decklink --disable-mbedtls --enable-libmysofa --enable-openal --enable-opencl --enable-opengl --disable-openssl --disable-pocketsphinx --enable-sndio --enable-sdl2 --enable-vapoursynth --enable-vulkan --enable-xlib --enable-zlib --enable-amf --enable-cuda-nvcc --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-libdrm --enable-libvpl --enable-libnpp --enable-nvdec --enable-nvenc --enable-omx --enable-rkmpp --enable-v4l2-m2m --enable-vaapi --enable-vdpau
  libavutil      58. 32.100 / 58. 32.100
  libavcodec     60. 33.100 / 60. 33.100
  libavformat    60. 17.100 / 60. 17.100
  libavdevice    60.  4.100 / 60.  4.100
  libavfilter     9. 13.100 /  9. 13.100
  libswscale      7.  6.100 /  7.  6.100
  libswresample   4. 13.100 /  4. 13.100
  libpostproc    57.  4.100 / 57.  4.100
Input #0, matroska,webm, from 'input.mkv':
  Metadata:
    ENCODER         : Lavf60.17.100
  Duration: 00:00:20.25, start: 0.000000, bitrate: 5560 kb/s
  Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709, progressive), 1918x814 [SAR 1:1 DAR 959:407], 23.98 fps, 23.98 tbr, 1k tbn (default)
    Metadata:
      DURATION        : 00:00:20.250000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:00:20.031000000
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (libx264))
  Stream #0:1 -> #0:1 (aac (native) -> vorbis (libvorbis))
Press [q] to stop, [?] for help
[libx264 @ 0x55640793d640] using SAR=1/1
[libx264 @ 0x55640793d640] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x55640793d640] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 0x55640793d640] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=23 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, matroska, to 'output.mkv':
  Metadata:
    encoder         : Lavf60.17.100
  Stream #0:0: Video: h264 (H264 / 0x34363248), nv12(tv, bt709, progressive), 1918x814 [SAR 1:1 DAR 959:407], q=2-31, 23.98 fps, 1k tbn (default)
    Metadata:
      DURATION        : 00:00:20.250000000
      encoder         : Lavc60.33.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1: Audio: vorbis (oV[0][0] / 0x566F), 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:00:20.031000000
      encoder         : Lavc60.33.100 libvorbis
[out#0/matroska @ 0x556407921100] video:5317kB audio:191kB subtitle:0kB other streams:0kB global headers:4kB muxing overhead: 0.264545%
frame=  482 fps= 75 q=-1.0 Lsize=    5523kB time=00:00:20.02 bitrate=2259.5kbits/s speed=3.11x
[libx264 @ 0x55640793d640] frame I:15    Avg QP:18.37  size: 29181
[libx264 @ 0x55640793d640] frame P:307   Avg QP:20.27  size: 14622
[libx264 @ 0x55640793d640] frame B:160   Avg QP:17.28  size:  3236
[libx264 @ 0x55640793d640] consecutive B-frames: 53.5%  5.4%  3.7% 37.3%
[libx264 @ 0x55640793d640] mb I  I16..4: 43.4% 53.1%  3.5%
[libx264 @ 0x55640793d640] mb P  I16..4: 10.8% 20.1%  0.2%  P16..4: 39.2%  3.2%  3.1%  0.0%  0.0%    skip:23.3%
[libx264 @ 0x55640793d640] mb B  I16..4:  1.8%  2.0%  0.0%  B16..8: 13.9%  1.0%  0.1%  direct: 3.2%  skip:78.0%  L0:58.0% L1:41.1% BI: 0.9%
[libx264 @ 0x55640793d640] 8x8 transform intra:62.3% inter:93.8%
[libx264 @ 0x55640793d640] coded y,uvDC,uvAC intra: 27.7% 53.5% 3.0% inter: 10.0% 27.8% 0.0%
[libx264 @ 0x55640793d640] i16 v,h,dc,p: 29% 33% 17% 21%
[libx264 @ 0x55640793d640] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 19% 46%  2%  2%  2%  3%  2%  2%
[libx264 @ 0x55640793d640] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 27% 21%  3%  6%  5%  7%  3%  2%
[libx264 @ 0x55640793d640] i8c dc,h,v,p: 52% 25% 21%  2%
[libx264 @ 0x55640793d640] Weighted P-Frames: Y:9.1% UV:4.9%
[libx264 @ 0x55640793d640] ref P L0: 41.1%  4.1% 26.2% 26.1%  2.4%
[libx264 @ 0x55640793d640] ref B L0: 55.2% 26.6% 18.2%
[libx264 @ 0x55640793d640] ref B L1: 61.4% 38.6%
[libx264 @ 0x55640793d640] kb/s:2139.92
ffmpeg -c:v h264_cuvid -i input.mkv -c:v libx264 -y output.mkv  59.59s user 0.61s system 888% cpu 6.777 total

Attachments (1)

ffmpeg-report.zip (105.6 KB ) - added by Jason Dove 2 years ago.
report output

Download all attachments as: .zip

Change History (17)

by Jason Dove, 2 years ago

Attachment: ffmpeg-report.zip added

report output

comment:1 by Jason Dove, 2 years ago

Sample was uploaded with name cuvid-decoder-regression-sample.mkv

comment:2 by Jason Dove, 2 years ago

Testing some builds from https://github.com/BtbN/FFmpeg-Builds/releases to try to find when the regression was introduced

2023-05-31 N-110946-g859c34706d behaves correctly (output is smooth)
2023-06-30 N-111313-ge4d4d616ba does not behave correctly (output is jerky)

comment:4 by Balling, 2 years ago

Keywords: decoder nvidia removed
Status: newopen

Yep very simple to reproduce with ffplay.exe -vcodec h264_cuvid C:\Users\ZAQU\Downloads\cuvid-decoder-regression-sample.mkv

Wrong reordering. Indeed, a regression

comment:5 by Jason Dove, 2 years ago

I did a git bisect and the first bad commit is 402d98c9d467dff6931d906ebb732b9a00334e0b.

I also confirmed that master with libavcodec/cuviddec.c at dc7bd7c5a5ad5ea800dfb63cc5dd15670d065527 works properly, so I at least have a workaround for now.

comment:6 by Balling, 2 years ago

It is funny. That commit was derived for a fix for another bug #8948, but it did not fix it, not to mention cuvid is not affected. So no wonder it broke other stuff.

comment:7 by Balling, 2 years ago

Dup. of #10409, workaround is -surfaces 10

comment:8 by Roman Arzumanyan, 2 years ago

Hello,

402d98c9d467dff6931d906ebb732b9a00334e0b merely changes the default value of nb_surfaces variable and allows user to set it via extra_hw_frames (to avoid the deprecated option usage and unify cuvid behaviour with nvdec in this aspect):

fifo_size_inc = ctx->nb_surfaces;
ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);

if (avctx->extra_hw_frames > 0)
    ctx->nb_surfaces += avctx->extra_hw_frames;

fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, fifo_size_inc) < 0) {
    av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
    ctx->internal_error = AVERROR(ENOMEM);
    return 0;
}

So it can be easily fixed by reverting the default nb_surfaces value:

{ "surfaces", "Maximum surfaces to be used for decoding", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = -1 }, 25, INT_MAX, VD | AV_OPT_FLAG_DEPRECATED }

But there are 2 caveats:
1) It looks like a bug in Video Codec SDK which returns insufficient min_num_decode_surfaces value.
2) Huge vRAM consumption increase. Many video sequences require just 6-7 surfaces in nvdec pool instead of 25.

Unfortunately, given pt. 1 it looks like there's no reliable way so far to determine actual minimal number of surfaces required for decoding.

Last edited 2 years ago by Roman Arzumanyan (previous) (diff)

comment:9 by Balling, 12 months ago

I love how it prints [h264_cuvid @ 000002a7104f6200] The "surfaces" option is deprecated: Maximum surfaces to be used for decoding

but it is a lie and the picture is fixed only with

ffplay.exe -surfaces 10 -vcodec h264_cuvid cuvid-decoder-regression-sample.mkv

anyway, yes extra_hw_frames 2 should be used now

Last edited 12 months ago by Balling (previous) (diff)

comment:10 by Timo R., 12 months ago

I still see nothing to be fixed.
What issue is there?

I found an unrelated issue about dropping frames on EOF, but I see nothing going wrong otherwise.
No buffer overruns or anything.

comment:11 by Balling, 12 months ago

I still see nothing to be fixed.

extra_hw_frames 2 fixes it. ffplay the video. It reorders it wrong.

comment:12 by Timo R., 12 months ago

Fixes _what_, I see zero issues in the deinterlaced output.
I added extra logging now to catch cases where it'd overrun its own buffer, which got silently dropped before.
But after the EOF fix, they never get triggered.

comment:13 by Balling, 12 months ago

What happens ffplay.exe -vcodec h264_cuvid cuvid-decoder-regression-sample.mkv

In my case the video plays completly broken. Unless your patches just now fixed that?

comment:14 by Timo R., 12 months ago

No, the patches just fixed an odd quirk about frames getting lost during EOF handling.

With that sample it's also broken, but ffmpeg indicates to me that it's not even interlaced?
Using one of my actually interlaced files, I get good results.

It's odd though that adding more surfaces somehow fixes this, one surface, as far as cuvid is concerned, contains both outputs in case of deinterlacing, so there shouldn't be any need to increase the surface amount when deinterlacing.

And indeed that file does already decode broken without any deinterlacing involved, so its issue is unrelated to deinterlacing.

The only logical explanation in regards to the issue of that file I have is that Nvidias format parser somehow misparses this file, and format->min_num_decode_surfaces is too small.
In which case it'd be an issue for Nvidia to fix.

comment:15 by Timo R., 12 months ago

It seems I somehow got this mixed up with #10409 which is about deinterlacing.
Though I still think the above idea is correct. FFmpeg even allocated 3 extra surfaces for extra performance compared to what cuvid calls for as a "minimum to successfully decode the content".

So the issue here must lie in nvidias h264 parser that returns too low of a number there.

comment:16 by Balling, 12 months ago

So the issue here must lie in nvidias h264 parser that returns too low of a number there.

So this basically:

1) It looks like a bug in Video Codec SDK which returns insufficient min_num_decode_surfaces value

Report to Nvidia, I imagine... ;)

Note: See TracTickets for help on using tickets.