Opened 2 years ago
Last modified 12 months ago
#10668 open defect
cuvid regression creates jerky output
| Reported by: | Jason Dove | Owned by: | |
|---|---|---|---|
| Priority: | important | Component: | avcodec |
| Version: | git-master | Keywords: | cuvid |
| Cc: | Blocked By: | ||
| Blocking: | Reproduced by developer: | no | |
| Analyzed by developer: | no |
Description
Summary of the bug:
Using the h264_cuvid decoder with certain content will cause the output to be jerky.
How to reproduce:
% ffmpeg -c:v h264_cuvid -i input.mkv -c:v libx264 -y output.mkv
ffmpeg version N-112777-g08e97dae20 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13.2.1 (GCC) 20230801
configuration: --prefix=/usr --extra-cflags=-I/opt/cuda/include --extra-ldflags=-L/opt/cuda/lib64 --enable-lto --disable-rpath --enable-gpl --enable-version3 --enable-nonfree --enable-shared --disable-static --disable-stripping --disable-htmlpages --enable-gray --enable-alsa --enable-avisynth --enable-bzlib --enable-chromaprint --enable-frei0r --enable-gcrypt --enable-gmp --enable-gnutls --enable-iconv --enable-ladspa --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcelt --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdavs2 --enable-libdc1394 --enable-libfdk-aac --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libglslang --enable-libgme --enable-libgsm --enable-libiec61883 --enable-libilbc --enable-libjack --enable-libjxl --enable-libklvanc --enable-libkvazaar --enable-liblensfun --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-libopencv --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopenvino --enable-libopus --enable-libplacebo --enable-libpulse --enable-librabbitmq --enable-librav1e --enable-librist --enable-librsvg --enable-librubberband --enable-librtmp --enable-libshine --enable-libsmbclient --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libsvthevc --enable-libsvtvp9 --disable-libtensorflow --enable-libtesseract --enable-libtheora --disable-libtls --enable-libtwolame --enable-libuavs3d --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxavs2 --enable-libxcb --enable-libxcb-shm --enable-libxcb-xfixes --enable-libxcb-shape --enable-libxvid --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-lzma --enable-decklink --disable-mbedtls --enable-libmysofa --enable-openal --enable-opencl --enable-opengl --disable-openssl --disable-pocketsphinx --enable-sndio --enable-sdl2 --enable-vapoursynth --enable-vulkan --enable-xlib --enable-zlib --enable-amf --enable-cuda-nvcc --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-libdrm --enable-libvpl --enable-libnpp --enable-nvdec --enable-nvenc --enable-omx --enable-rkmpp --enable-v4l2-m2m --enable-vaapi --enable-vdpau
libavutil 58. 32.100 / 58. 32.100
libavcodec 60. 33.100 / 60. 33.100
libavformat 60. 17.100 / 60. 17.100
libavdevice 60. 4.100 / 60. 4.100
libavfilter 9. 13.100 / 9. 13.100
libswscale 7. 6.100 / 7. 6.100
libswresample 4. 13.100 / 4. 13.100
libpostproc 57. 4.100 / 57. 4.100
Input #0, matroska,webm, from 'input.mkv':
Metadata:
ENCODER : Lavf60.17.100
Duration: 00:00:20.25, start: 0.000000, bitrate: 5560 kb/s
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709, progressive), 1918x814 [SAR 1:1 DAR 959:407], 23.98 fps, 23.98 tbr, 1k tbn (default)
Metadata:
DURATION : 00:00:20.250000000
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 00:00:20.031000000
Stream mapping:
Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (libx264))
Stream #0:1 -> #0:1 (aac (native) -> vorbis (libvorbis))
Press [q] to stop, [?] for help
[libx264 @ 0x55640793d640] using SAR=1/1
[libx264 @ 0x55640793d640] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x55640793d640] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 0x55640793d640] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=23 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, matroska, to 'output.mkv':
Metadata:
encoder : Lavf60.17.100
Stream #0:0: Video: h264 (H264 / 0x34363248), nv12(tv, bt709, progressive), 1918x814 [SAR 1:1 DAR 959:407], q=2-31, 23.98 fps, 1k tbn (default)
Metadata:
DURATION : 00:00:20.250000000
encoder : Lavc60.33.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: vorbis (oV[0][0] / 0x566F), 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 00:00:20.031000000
encoder : Lavc60.33.100 libvorbis
[out#0/matroska @ 0x556407921100] video:5317kB audio:191kB subtitle:0kB other streams:0kB global headers:4kB muxing overhead: 0.264545%
frame= 482 fps= 75 q=-1.0 Lsize= 5523kB time=00:00:20.02 bitrate=2259.5kbits/s speed=3.11x
[libx264 @ 0x55640793d640] frame I:15 Avg QP:18.37 size: 29181
[libx264 @ 0x55640793d640] frame P:307 Avg QP:20.27 size: 14622
[libx264 @ 0x55640793d640] frame B:160 Avg QP:17.28 size: 3236
[libx264 @ 0x55640793d640] consecutive B-frames: 53.5% 5.4% 3.7% 37.3%
[libx264 @ 0x55640793d640] mb I I16..4: 43.4% 53.1% 3.5%
[libx264 @ 0x55640793d640] mb P I16..4: 10.8% 20.1% 0.2% P16..4: 39.2% 3.2% 3.1% 0.0% 0.0% skip:23.3%
[libx264 @ 0x55640793d640] mb B I16..4: 1.8% 2.0% 0.0% B16..8: 13.9% 1.0% 0.1% direct: 3.2% skip:78.0% L0:58.0% L1:41.1% BI: 0.9%
[libx264 @ 0x55640793d640] 8x8 transform intra:62.3% inter:93.8%
[libx264 @ 0x55640793d640] coded y,uvDC,uvAC intra: 27.7% 53.5% 3.0% inter: 10.0% 27.8% 0.0%
[libx264 @ 0x55640793d640] i16 v,h,dc,p: 29% 33% 17% 21%
[libx264 @ 0x55640793d640] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 19% 46% 2% 2% 2% 3% 2% 2%
[libx264 @ 0x55640793d640] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 27% 21% 3% 6% 5% 7% 3% 2%
[libx264 @ 0x55640793d640] i8c dc,h,v,p: 52% 25% 21% 2%
[libx264 @ 0x55640793d640] Weighted P-Frames: Y:9.1% UV:4.9%
[libx264 @ 0x55640793d640] ref P L0: 41.1% 4.1% 26.2% 26.1% 2.4%
[libx264 @ 0x55640793d640] ref B L0: 55.2% 26.6% 18.2%
[libx264 @ 0x55640793d640] ref B L1: 61.4% 38.6%
[libx264 @ 0x55640793d640] kb/s:2139.92
ffmpeg -c:v h264_cuvid -i input.mkv -c:v libx264 -y output.mkv 59.59s user 0.61s system 888% cpu 6.777 total
Attachments (1)
Change History (17)
by , 2 years ago
| Attachment: | ffmpeg-report.zip added |
|---|
comment:2 by , 2 years ago
Testing some builds from https://github.com/BtbN/FFmpeg-Builds/releases to try to find when the regression was introduced
2023-05-31 N-110946-g859c34706d behaves correctly (output is smooth)
2023-06-30 N-111313-ge4d4d616ba does not behave correctly (output is jerky)
comment:4 by , 2 years ago
| Keywords: | decoder nvidia removed |
|---|---|
| Status: | new → open |
Yep very simple to reproduce with ffplay.exe -vcodec h264_cuvid C:\Users\ZAQU\Downloads\cuvid-decoder-regression-sample.mkv
Wrong reordering. Indeed, a regression
comment:5 by , 2 years ago
I did a git bisect and the first bad commit is 402d98c9d467dff6931d906ebb732b9a00334e0b.
I also confirmed that master with libavcodec/cuviddec.c at dc7bd7c5a5ad5ea800dfb63cc5dd15670d065527 works properly, so I at least have a workaround for now.
comment:6 by , 2 years ago
It is funny. That commit was derived for a fix for another bug #8948, but it did not fix it, not to mention cuvid is not affected. So no wonder it broke other stuff.
comment:8 by , 2 years ago
Hello,
402d98c9d467dff6931d906ebb732b9a00334e0b merely changes the default value of nb_surfaces variable and allows user to set it via extra_hw_frames (to avoid the deprecated option usage and unify cuvid behaviour with nvdec in this aspect):
fifo_size_inc = ctx->nb_surfaces;
ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
if (avctx->extra_hw_frames > 0)
ctx->nb_surfaces += avctx->extra_hw_frames;
fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, fifo_size_inc) < 0) {
av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
ctx->internal_error = AVERROR(ENOMEM);
return 0;
}
So it can be easily fixed by reverting the default nb_surfaces value:
{ "surfaces", "Maximum surfaces to be used for decoding", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = -1 }, 25, INT_MAX, VD | AV_OPT_FLAG_DEPRECATED }
But there are 2 caveats:
1) It looks like a bug in Video Codec SDK which returns insufficient min_num_decode_surfaces value.
2) Huge vRAM consumption increase. Many video sequences require just 6-7 surfaces in nvdec pool instead of 25.
Unfortunately, given pt. 1 it looks like there's no reliable way so far to determine actual minimal number of surfaces required for decoding.
comment:9 by , 12 months ago
I love how it prints [h264_cuvid @ 000002a7104f6200] The "surfaces" option is deprecated: Maximum surfaces to be used for decoding
but it is a lie and the picture is fixed only with
ffplay.exe -surfaces 10 -vcodec h264_cuvid cuvid-decoder-regression-sample.mkv
anyway, yes extra_hw_frames 2 should be used now
comment:10 by , 12 months ago
I still see nothing to be fixed.
What issue is there?
I found an unrelated issue about dropping frames on EOF, but I see nothing going wrong otherwise.
No buffer overruns or anything.
comment:11 by , 12 months ago
I still see nothing to be fixed.
extra_hw_frames 2 fixes it. ffplay the video. It reorders it wrong.
comment:12 by , 12 months ago
Fixes _what_, I see zero issues in the deinterlaced output.
I added extra logging now to catch cases where it'd overrun its own buffer, which got silently dropped before.
But after the EOF fix, they never get triggered.
comment:13 by , 12 months ago
What happens ffplay.exe -vcodec h264_cuvid cuvid-decoder-regression-sample.mkv
In my case the video plays completly broken. Unless your patches just now fixed that?
comment:14 by , 12 months ago
No, the patches just fixed an odd quirk about frames getting lost during EOF handling.
With that sample it's also broken, but ffmpeg indicates to me that it's not even interlaced?
Using one of my actually interlaced files, I get good results.
It's odd though that adding more surfaces somehow fixes this, one surface, as far as cuvid is concerned, contains both outputs in case of deinterlacing, so there shouldn't be any need to increase the surface amount when deinterlacing.
And indeed that file does already decode broken without any deinterlacing involved, so its issue is unrelated to deinterlacing.
The only logical explanation in regards to the issue of that file I have is that Nvidias format parser somehow misparses this file, and format->min_num_decode_surfaces is too small.
In which case it'd be an issue for Nvidia to fix.
comment:15 by , 12 months ago
It seems I somehow got this mixed up with #10409 which is about deinterlacing.
Though I still think the above idea is correct. FFmpeg even allocated 3 extra surfaces for extra performance compared to what cuvid calls for as a "minimum to successfully decode the content".
So the issue here must lie in nvidias h264 parser that returns too low of a number there.
comment:16 by , 12 months ago
So the issue here must lie in nvidias h264 parser that returns too low of a number there.
So this basically:
1) It looks like a bug in Video Codec SDK which returns insufficient min_num_decode_surfaces value
Report to Nvidia, I imagine... ;)



report output