#8225 closed defect (invalid)
NVIDIA Driver Bug Affecting Full HW Based Transcoding
Reported by: | smallishzulu | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | unspecified | Keywords: | nvenc, cuviddec |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
The NVIDIA Driver series 430.X & 435.X have an issue which is affecting the full HW based transcoding.
The probem comes to surface, when you try hw decode an input stream and scale & transcode it with multiple profiles.
(There can be other scenarios got affected. This is the one I noticed.)
Example command line to test:
GPULOWER=0 && /opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -thread_queue_size 2048 -fflags +nobuffer+discardcorrupt -re -hwaccel_device $GPULOWER -hwaccel cuvid -c:v h264_cuvid -i 'udp://239.1.5.2:5000?fifo_size=3355440&buffer_size=3355440&overrun_nonfatal=1' -filter_complex '[0:p:16401:0]scale_npp=1920:1080[v0];[0:p:16401:0]scale_npp=1280:720[v1];[0:p:16401:0]scale_npp=960:540[v2];[0:p:16401:0]scale_npp=704:396[v3];[0:p:16401:0]scale_npp=480:270[v4];[0:p:16401:0]scale_npp=416:234[v5];[0:p:16401:0]scale_npp=416:234[v6]' -g 50 -r 25 -map [v0] -c:v:0 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:0 6000k -maxrate:v:0 6000k -map [v1] -c:v:1 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:1 3150k -maxrate:v:1 3150k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:2 2000k -maxrate:v:2 2000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:3 1400k -maxrate:v:3 1400k -map [v4] -c:v:4 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:4 800k -maxrate:v:4 800k -map [v5] -c:v:5 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:5 400k -maxrate:v:5 400k -map [v6] -c:v:6 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -profile:v main -level 4.1 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu $GPULOWER -b:v:6 200k -maxrate:v:6 200k -map 0:p:16401:1 -c:a:0 aac -b:a:0 96k -metadata:s:a:0 language=eng -max_muxing_queue_size 1000 -f matroska -y /dev/null
Running a single instance of above command work without a problem.
If you run the same command on a new shell:
1) 1st running shell's encode halts. If you wait enough, there will be circular buffer overrun messages for decode.
OR
2) 2nd shell will throw out below error (ffmpeg does not exit, If you wait enough, there will be circular buffer overrun messages for decode):
[h264_nvenc @ 0x8e3d00] Failed locking bitstream buffer: invalid param (8)
Video encoding failed
I gdp the ffmpeg process:
Program received signal SIGINT, Interrupt.
sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007fffc1904257 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffc17ff2a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffc19c1f10 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007fffc18a821c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007fffc17df22c in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007fffc19485f0 in cuMemFree_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff4fc8113 in ?? () from /opt/ffmpeg/lib/libavutil.so.56
#8 0x00007ffff4fb7bd7 in av_buffer_pool_uninit () from /opt/ffmpeg/lib/libavutil.so.56
#9 0x00007ffff4fc5bb0 in ?? () from /opt/ffmpeg/lib/libavutil.so.56
#10 0x00007ffff4fb781f in av_buffer_unref () from /opt/ffmpeg/lib/libavutil.so.56
#11 0x00007ffff78360c7 in ?? () from /opt/ffmpeg/lib/libavfilter.so.7
#12 0x00007ffff77395f3 in avfilter_free () from /opt/ffmpeg/lib/libavfilter.so.7
#13 0x00007ffff773b35c in avfilter_graph_free () from /opt/ffmpeg/lib/libavfilter.so.7
#14 0x0000000000424efb in ?? ()
#15 0x0000000000417751 in ?? ()
#16 0x00000000004240ff in ?? ()
Same command with Driver 418.88 with multiple instances do not have any problem.
libcuda.so.1 is related with driver
FFmpeg version:
ffmpeg version N-94767-ge9cc873 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 6.5.0 (Ubuntu 6.5.0-2ubuntu1~14.04.1) 20181026
configuration: --prefix=/opt/ffmpeg --enable-nonfree --enable-gpl --extra-cflags='-I/opt/ffmpeg/include -I/usr/local/include -I/usr/local/cuda/include -I/opt/lib/NDISDK/ndi/include -I/opt/lib/subsys/include' --extra-ldflags='-L/opt/ffmpeg/lib -L/usr/local/cuda/lib64 -L/opt/lib/NDISDK/ndi/lib/x86_64-linux-gnu -L/opt/lib/subsys/lib' --bindir=/opt/ffmpeg/bin --extra-libs=-ldl --enable-libx264 --enable-libx265 --enable-nonfree --enable-gpl --enable-nvenc --enable-libzvbi --enable-libfdk-aac --enable-libzimg --enable-libzmq --enable-libfreetype --enable-static --enable-shared --enable-vdpau --enable-cuda --enable-cuvid --enable-libmp3lame --enable-vaapi --enable-openssl --enable-ffnvcodec --enable-libfontconfig --enable-libfribidi --enable-cuda-nvcc --enable-libnpp --disable-debug --nvcc=/usr/local/cuda/bin/nvcc --enable-opencl --enable-libmfx --enable-libndi_newtek --enable-libtesseract --enable-libass --enable-opencl
libavutil 56. 34.100 / 56. 34.100
libavcodec 58. 56.114 / 58. 56.114
libavformat 58. 32.104 / 58. 32.104
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 58.102 / 7. 58.102
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
FFmpeg version does not change the behaviour.
I thought the case is maybe related with https://trac.ffmpeg.org/ticket/7582 ticket.
I did the changes mentioned in diff patch. Same behaviour continues.
I think it is needed to report to NVIDIA.
Change History (21)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
I have same issue with 430.09 - 440.64 drivers, 418.XX is not affected, but it is only reproducable on GTX and older, RTX generation is not affected, can you confirm this?
Also with kernel 4.9 ffmpeg works longer (>30minutes), than on kernel 4.19 (<10minutes), it looks like bug in NVIDIA drivers, but also could be related to some content locking in ffmpeg, but i don't think that it is related to #7582, because i can reproduce this also on old ffmpeg 3.2.4 which didn't have async calls yet.
comment:5 by , 5 years ago
Replying to Balling:
Please test git master. It is the only version supported...
I can reproduce this also on current master (2020-04-03), also it is reproducable on Quadro Pascal cards (not Quadro RTX), so we stay on 418 branch until fixed from nvidia or ffmpeg, no issues with RTX generation.
comment:6 by , 5 years ago
How you were able to test latest git with 418.x driver as latest headers require a newer driver version to run?
BTW, I do not think this is a FFmpeg problem. This started to happen after 418 driver series. For sure, there may be a fix by using different API calls.
Maybe Timo or another Nvidia video codec focused developer who is close to Nvidia, can get a faster information/reply on the case.
comment:7 by , 5 years ago
Example command line to generate problem:
/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel debug -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -drop_second_field 0 -hwaccel_device 0 -hwaccel cuvid -c:v h264_cuvid -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]yadif_cuda=mode=1,fps=fps=60,scale_npp=1280:720:interp_algo=super[v0];[0:p:1:0]yadif_cuda=mode=1,fps=fps=60,scale_npp=720:576:interp_algo=super[v1];[0:p:1:0]yadif_cuda=mode=1,fps=fps=50,scale_npp=1920:1080:interp_algo=super[v2];[0:p:1:0]yadif_cuda=mode=1,fps=fps=25,scale_npp=640:360:interp_algo=super[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8"
Error:
[udp @ 0x1883b80] Circular buffer overrun. To avoid, increase fifo_size URL option. To survive in such case, use overrun_nonfatal option
[h264_nvenc @ 0x196c140] dl_fn->cuda_dl->cuCtxPopCurrent(&dummy) failed
Video encoding failed
[Parsed_yadif_cuda_0 @ 0x18df380] cu->cuCtxPushCurrent(s->cu_ctx) failed
Explanation: UDP input overflows as output stops.
[h264_nvenc @ 0x196c140] dl_fn->cuda_dl->cuCtxPopCurrent(&dummy) failed
Video encoding failed
Tested with 440.82 driver, issue still exists
comment:8 by , 5 years ago
Again; CPU decode, GPU encode works without any issue:
/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]fps=fps=60,scale=1280:720[v0];[0:p:1:0]fps=fps=60,scale=720:576[v1];[0:p:1:0]fps=fps=50,scale=1920:1080[v2];[0:p:1:0]fps=fps=25,scale=640:360[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8"
Problem is in somewhere at decode step with new drivers or API call.
comment:9 by , 5 years ago
Interestingly nvdec decoding works:
(Probably due to GPU Dec -> System RAM -> GPU Encode / Not full line transcode)
/opt/ffmpeg/bin/ffmpeg -hide_banner -ignore_unknown -loglevel verbose -async 1 -threads 2 -filter_complex_threads 2 -thread_queue_size 2048 -fflags +discardcorrupt -drop_second_field 0 -hwaccel_device 0 -hwaccel nvdec -c:v h264_cuvid -i 'udp://233.33.33.1:5000' -aspect 16:9 -filter_complex '[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=60,scale_npp=1280:720:interp_algo=super[v0];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=60,scale_npp=720:576:interp_algo=super[v1];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=50,scale_npp=1920:1080:interp_algo=super[v2];[0:p:1:0]hwupload_cuda,yadif_cuda=mode=1,fps=fps=25,scale_npp=640:360:interp_algo=super[v3]' -g 50 -map [v0] -c:v:0 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:0 2000k -maxrate:v:0 2000k -map [v1] -c:v:1 h264_nvenc -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:1 1200k -maxrate:v:1 1200k -map [v2] -c:v:2 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:2 4000k -maxrate:v:2 4000k -map [v3] -c:v:3 h264_nvenc -qmin 19 -preset hq -rc:v vbr_hq -vbr 1 -2pass 0 -strict_gop 1 -rc-lookahead 32 -no-scenecut 1 -forced-idr 1 -gpu 0 -b:v:3 500k -maxrate:v:3 500k -map 0:p:1:1 -c:a:0 aac -b:a:0 128k -ac 2 -metadata:s:a:0 language=tur -sn -max_muxing_queue_size 1000 -var_stream_map 'v:0,agroup:group0 v:1,agroup:group0 v:2,agroup:group0 v:3,agroup:group0 a:0,agroup:group0,default:yes,language:tur ' -hls_list_size 3 -hls_time 6 -threads 2 -reconnect_at_eof 1 -reconnect_streamed 1 -reconnect_delay_max 6 -timeout 0.1 -multiple_requests 1 -http_persistent 0 -method PUT -master_pl_name index.m3u8 -flags +global_header -f fifo -fifo_format hls -attempt_recovery 1 -recover_any_error 1 -drop_pkts_on_overflow 1 -recovery_wait_time 1 -queue_size 1000 -format_opts "hls_time=6:hls_list_size=3:timeout=0.1:http_persistent=0:master_pl_name=index.m3u8:hls_segment_filename="/media/origin/chanell1//$d-%v-%d.ts":method=PUT:hls_flags=+round_durations+delete_segments:var_stream_map='v\\:0,agroup\\:group0 v
:1,agroup
:group0 v
:2,agroup
:group0 v
:3,agroup
:group0 a
:0,agroup
:group0,default
:yes,language
:tur '" -hls_flags +round_durations+delete_segments -hls_segment_filename "/media/origin/chanell1$d-%v-%d.ts" -flags +global_header -f hls "/media/origin/chanell10%v.m3u8
comment:10 by , 5 years ago
Downgrading to 4.2.1 with driver version 418.113 and cuda 10.1 definitely fixed this issue.
follow-up: 12 comment:11 by , 5 years ago
It is not fix friend. It is only a workaround to keep gears turning around. It will be nice if someone who is close to Nvidia reports this. It is a problem at GPU Decoding stage related to driver. Try FFmpeg 4.2.1 with 430.x or 440.x series drivers with Cuda 10.1, the problem occurs.
There can be a change in decode API call or Nvidia messed up with the driver.
Apart from Pascal GPUs; new generation cards (RTX) are working with 430.x or 440.x series drivers.
Yet, there is no official announcement for 430.x or 440.x series drivers to drop Pascal GPUs for dense transcoding (more than 2 parallel sessions bump out the problem. Btw, not related at all with GPU limitations.)
comment:12 by , 5 years ago
Replying to smallishzulu:
It is not fix friend. It is only a workaround to keep gears turning around.
Correct. Workaround that fixes the issue for us until proper fix is released...
comment:13 by , 4 years ago
Hi all,
we're facing exactly same issue and came to exactly same workaround - stick around with old drivers. Good news though is that I have some decent contact to nvidia and technicians are currently looking into this
follow-up: 15 comment:14 by , 4 years ago
Hi all,
I contacted with NVIDIA and they have generated the problem at their side.
They also told me that issue is in DEV queue by July 31, 2020.
BR,
comment:15 by , 4 years ago
Replying to smallishzulu:
I contacted with NVIDIA and they have generated the problem at their side.
They also told me that issue is in DEV queue by July 31, 2020.
Hi,
any news in this?
-m
comment:16 by , 4 years ago
Latest update by a week ago from NVIDIA:
"Issue was fixed in future driver branch and complete testing cycle is going on. Stay tuned for further update on test driver release."
I am checking driver release notes. Currently not released in October driver files due to my tests & reads.
comment:17 by , 4 years ago
Latest update by a week ago from NVIDIA:
"Issue was fixed in future driver branch and complete testing cycle is going on. Stay tuned for further update on test driver release."
I am checking driver release notes. Currently not released in October driver files due to my tests & reads.
comment:19 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
It seems 460.32.03 driver with the date 7 January 2021 fixes the issue
comment:20 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
It seems 460.32.03 driver with the date 7 January 2021 fixes the issue
comment:21 by , 4 years ago
Resolution: | fixed → invalid |
---|
Additional Info;
If hw decoding (-hwaccel_device $GPULOWER -hwaccel cuvid -c:v h264_cuvid) and hw based scaling (scale_npp) is removed, you can run multiple sessions of above command with 430.x and 435.x drivers.