Opened 8 months ago

Last modified 9 days ago

#8225 new defect

NVIDIA Driver Bug Affecting Full HW Based Transcoding

Reported by: smallishzulu Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords: nvenc, cuviddec
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

The NVIDIA Driver series 430.X & 435.X have an issue which is affecting the full HW based transcoding.
The probem comes to surface, when you try hw decode an input stream and scale & transcode it with multiple profiles.
(There can be other scenarios got affected. This is the one I noticed.)

Example command line to test:

GPULOWER=0 && /opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -thread_queue_size 2048 -fflags +nobuffer+discardcorrupt -re -hwaccel_device $GPULOWER -hwaccel cuvid -c:v h264_cuvid -i 'udp://239.1.5.2:5000?fifo_size=3355440&buffer_size=3355440&overrun_nonfatal=1' -filter_complex '[0:p:16401:0]scale_npp=1920:1080[v0];[0:p:16401:0]scale_npp=1280:720[v1];[0:p:16401:0]scale_npp=960:540[v2];[0:p:16401:0]scale_npp=704:396[v3];[0:p:16401:0]scale_npp=480:270[v4];[0:p:16401:0]scale_npp=416:234[v5];[0:p:16401:0]scale_npp=416:234[v6]' -g 50 -r 25 -map [v0] -c:v:0 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:0 6000k -maxrate:v:0 6000k -map [v1] -c:v:1 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:1 3150k -maxrate:v:1 3150k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:2 2000k -maxrate:v:2 2000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:3 1400k -maxrate:v:3 1400k -map [v4] -c:v:4 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:4 800k -maxrate:v:4 800k -map [v5] -c:v:5 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:5 400k -maxrate:v:5 400k -map [v6] -c:v:6 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:6 200k -maxrate:v:6 200k -map 0:p:16401:1 -c:a:0 aac -b:a:0 96k -metadata:s:a:0 language=eng -max_muxing_queue_size 1000 -f matroska -y /dev/null

Running a single instance of above command work without a problem.
If you run the same command on a new shell:

1) 1st running shell's encode halts. If you wait enough, there will be circular buffer overrun messages for decode.
OR
2) 2nd shell will throw out below error (ffmpeg does not exit, If you wait enough, there will be circular buffer overrun messages for decode):
[h264_nvenc @ 0x8e3d00] Failed locking bitstream buffer: invalid param (8)
Video encoding failed

I gdp the ffmpeg process:

Program received signal SIGINT, Interrupt.
sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007fffc1904257 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffc17ff2a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffc19c1f10 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007fffc18a821c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007fffc17df22c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007fffc19485f0 in cuMemFree_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff4fc8113 in ?? () from /opt/ffmpeg/lib/libavutil.so.56
#8 0x00007ffff4fb7bd7 in av_buffer_pool_uninit () from /opt/ffmpeg/lib/libavutil.so.56
#9 0x00007ffff4fc5bb0 in ?? () from /opt/ffmpeg/lib/libavutil.so.56
#10 0x00007ffff4fb781f in av_buffer_unref () from /opt/ffmpeg/lib/libavutil.so.56
#11 0x00007ffff78360c7 in ?? () from /opt/ffmpeg/lib/libavfilter.so.7
#12 0x00007ffff77395f3 in avfilter_free () from /opt/ffmpeg/lib/libavfilter.so.7
#13 0x00007ffff773b35c in avfilter_graph_free () from /opt/ffmpeg/lib/libavfilter.so.7
#14 0x0000000000424efb in ?? ()
#15 0x0000000000417751 in ?? ()
#16 0x00000000004240ff in ?? ()

Same command with Driver 418.88 with multiple instances do not have any problem.
libcuda.so.1 is related with driver

FFmpeg version:
ffmpeg version N-94767-ge9cc873 Copyright (c) 2000-2019 the FFmpeg developers

built with gcc 6.5.0 (Ubuntu 6.5.0-2ubuntu1~14.04.1) 20181026
configuration: --prefix=/opt/ffmpeg --enable-nonfree --enable-gpl --extra-cflags='-I/opt/ffmpeg/include -I/usr/local/include -I/usr/local/cuda/include -I/opt/lib/NDISDK/ndi/include -I/opt/lib/subsys/include' --extra-ldflags='-L/opt/ffmpeg/lib -L/usr/local/cuda/lib64 -L/opt/lib/NDISDK/ndi/lib/x86_64-linux-gnu -L/opt/lib/subsys/lib' --bindir=/opt/ffmpeg/bin --extra-libs=-ldl --enable-libx264 --enable-libx265 --enable-nonfree --enable-gpl --enable-nvenc --enable-libzvbi --enable-libfdk-aac --enable-libzimg --enable-libzmq --enable-libfreetype --enable-static --enable-shared --enable-vdpau --enable-cuda --enable-cuvid --enable-libmp3lame --enable-vaapi --enable-openssl --enable-ffnvcodec --enable-libfontconfig --enable-libfribidi --enable-cuda-nvcc --enable-libnpp --disable-debug --nvcc=/usr/local/cuda/bin/nvcc --enable-opencl --enable-libmfx --enable-libndi_newtek --enable-libtesseract --enable-libass --enable-opencl
libavutil 56. 34.100 / 56. 34.100
libavcodec 58. 56.114 / 58. 56.114
libavformat 58. 32.104 / 58. 32.104
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 58.102 / 7. 58.102
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100

Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

Use -h to get full help or, even better, run 'man ffmpeg'

FFmpeg version does not change the behaviour.
I thought the case is maybe related with https://trac.ffmpeg.org/ticket/7582 ticket.
I did the changes mentioned in diff patch. Same behaviour continues.

I think it is needed to report to NVIDIA.

Change History (12)

comment:1 Changed 8 months ago by smallishzulu

Additional Info;

If hw decoding (-hwaccel_device $GPULOWER -hwaccel cuvid -c:v h264_cuvid) and hw based scaling (scale_npp) is removed, you can run multiple sessions of above command with 430.x and 435.x drivers.

comment:2 Changed 2 months ago by thunder.m

I have same issue with 430.09 - 440.64 drivers, 418.XX is not affected, but it is only reproducable on GTX and older, RTX generation is not affected, can you confirm this?

Also with kernel 4.9 ffmpeg works longer (>30minutes), than on kernel 4.19 (<10minutes), it looks like bug in NVIDIA drivers, but also could be related to some content locking in ffmpeg, but i don't think that it is related to #7582, because i can reproduce this also on old ffmpeg 3.2.4 which didn't have async calls yet.

comment:3 Changed 2 months ago by thunder.m

I reproduced this also on latest 4.2.2 version.

comment:4 follow-up: Changed 2 months ago by Balling

Please test git master. It is the only version supported...

comment:5 in reply to: ↑ 4 Changed 2 months ago by thunder.m

Replying to Balling:

Please test git master. It is the only version supported...

I can reproduce this also on current master (2020-04-03), also it is reproducable on Quadro Pascal cards (not Quadro RTX), so we stay on 418 branch until fixed from nvidia or ffmpeg, no issues with RTX generation.

comment:6 Changed 2 months ago by smallishzulu

How you were able to test latest git with 418.x driver as latest headers require a newer driver version to run?

BTW, I do not think this is a FFmpeg problem. This started to happen after 418 driver series. For sure, there may be a fix by using different API calls.

Maybe Timo or another Nvidia video codec focused developer who is close to Nvidia, can get a faster information/reply on the case.

comment:7 Changed 5 weeks ago by smallishzulu

Example command line to generate problem:

/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel debug -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -drop_second_field 0 -hwaccel_device 0 -hwaccel cuvid -c:v h264_cuvid -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]yadif_cuda=mode=1,fps=fps=60,scale_npp=1280:720:interp_algo=super[v0];[0:p:1:0]yadif_cuda=mode=1,fps=fps=60,scale_npp=720:576:interp_algo=super[v1];[0:p:1:0]yadif_cuda=mode=1,fps=fps=50,scale_npp=1920:1080:interp_algo=super[v2];[0:p:1:0]yadif_cuda=mode=1,fps=fps=25,scale_npp=640:360:interp_algo=super[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8"

Error:
[udp @ 0x1883b80] Circular buffer overrun. To avoid, increase fifo_size URL option. To survive in such case, use overrun_nonfatal option
[h264_nvenc @ 0x196c140] dl_fn->cuda_dl->cuCtxPopCurrent(&dummy) failed
Video encoding failed
[Parsed_yadif_cuda_0 @ 0x18df380] cu->cuCtxPushCurrent(s->cu_ctx) failed

Explanation: UDP input overflows as output stops.
[h264_nvenc @ 0x196c140] dl_fn->cuda_dl->cuCtxPopCurrent(&dummy) failed
Video encoding failed

Tested with 440.82 driver, issue still exists

Last edited 5 weeks ago by smallishzulu (previous) (diff)

comment:8 Changed 5 weeks ago by smallishzulu

Again; CPU decode, GPU encode works without any issue:

/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]fps=fps=60,scale=1280:720[v0];[0:p:1:0]fps=fps=60,scale=720:576[v1];[0:p:1:0]fps=fps=50,scale=1920:1080[v2];[0:p:1:0]fps=fps=25,scale=640:360[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8"

Problem is in somewhere at decode step with new drivers or API call.

comment:9 Changed 5 weeks ago by smallishzulu

Interestingly nvdec decoding works:
(Probably due to GPU Dec -> System RAM -> GPU Encode / Not full line transcode)

/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -drop_second_field 0 -hwaccel_device 0 -hwaccel nvdec -c:v h264_cuvid -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=60,scale_npp=1280:720:interp_algo=super[v0];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=60,scale_npp=720:576:interp_algo=super[v1];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=50,scale_npp=1920:1080:interp_algo=super[v2];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=25,scale_npp=640:360:interp_algo=super[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8

comment:10 Changed 9 days ago by misko

Downgrading to 4.2.1 with driver version 418.113 and cuda 10.1 definitely fixed this issue.

comment:11 follow-up: Changed 9 days ago by smallishzulu

It is not fix friend. It is only a workaround to keep gears turning around. It will be nice if someone who is close to Nvidia reports this. It is a problem at GPU Decoding stage related to driver. Try FFmpeg 4.2.1 with 430.x or 440.x series drivers with Cuda 10.1, the problem occurs.

There can be a change in decode API call or Nvidia messed up with the driver.
Apart from Pascal GPUs; new generation cards (RTX) are working with 430.x or 440.x series drivers.

Yet, there is no official announcement for 430.x or 440.x series drivers to drop Pascal GPUs for dense transcoding (more than 2 parallel sessions bump out the problem. Btw, not related at all with GPU limitations.)

comment:12 in reply to: ↑ 11 Changed 9 days ago by misko

Replying to smallishzulu:

It is not fix friend. It is only a workaround to keep gears turning around.

Correct. Workaround that fixes the issue for us until proper fix is released...

Note: See TracTickets for help on using tickets.