Opened 2 years ago

Closed 2 years ago

#6058 closed defect (invalid)

cuvid: Transcoding hangs after several files processed

Reported by: slomkowski Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: nvenc
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

I'm using ffmpeg to transcode a lot of MPEG-2 videos grabbed from DVB-S to H.264. I'm using ffmpeg with NVidia CUVID, videos are processed in batch mode, with two instances of ffmpeg running. However, after several recordings processed successfully, both ffmpeg instances hang. It appears to be working with frame rate steadily going to 0 fps; output H.264 file grows several times the size of original file. This continues until ffmpeg is eventually killed by batch managing process.

After that, ps aux shows, that ffmpeg instances have become defunct and graphics card is no longer responding. nvidia-smi dmon shows 100% load on decoder and 0% on encoder despite no process running. Trying to reset the card using nvidia-smi -i 0 -r doesn't bring any good, the card becomes completely unresponsive and nvidia-smi is no longer able to communicate with it.

I ran the transcoding with and without -hwaccel option, each one eventually failed.

The problem is not reproducible in easy way, when you run ffmpeg on the very same files which caused the problem before, everything goes well. Just after processing several other files the processing hangs.

My configuration:

  • Debian 8, kernel 4.7.0.
  • graphics card: GeForce? GTX 950.
  • NVidia drivers tested: 375.26, 367.57.
  • ffmpeg versions tested: 3.2, 3.2.2, N-82987-gd800d48.

I include logs from processing which hang.

Attachments (4)

cuvid-hang-ffmpeg-log-1.log (41.8 KB) - added by slomkowski 2 years ago.
cuvid-hang-ffmeg-log-2.log (29.4 KB) - added by slomkowski 2 years ago.
ffmpeg-RV_N24____2017-01-05_09.14__vsn0-20170105-093537.log (14.5 KB) - added by slomkowski 2 years ago.
6058-hangs-ffmpeg-dec100-1.log (39.1 KB) - added by slomkowski 2 years ago.
log of processing 8MB file, which hangs at cuvidDestroyDecoder()

Download all attachments as: .zip

Change History (23)

Changed 2 years ago by slomkowski

Changed 2 years ago by slomkowski

comment:1 Changed 2 years ago by slomkowski

I confirm that the problem happens also when one ffmpeg instance is running.

nvidia-smi output:

Thu Jan  5 12:03:39 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 950     Off  | 0000:01:00.0     Off |                  N/A |
| 22%   61C    P0    38W /  99W |     63MiB /  1996MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     29889    C   /home/rtv/soft/ffmpeg/ffmpeg/ffmpeg             61MiB |
+-----------------------------------------------------------------------------+

nvidia-smi dmon output:

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0    38    60     0     0     0   100  3304  1404
    0    38    60     0     0     0   100  3304  1404
    0    38    60     0     0     0   100  3304  1404

nvidia-smi pmon output (notice dec is 0 here):

# gpu     pid  type    sm   mem   enc   dec   command
# Idx       #   C/G     %     %     %     %   name
    0   29889     C     0     0     0     0   ffmpeg         
    0   29889     C     0     0     0     0   ffmpeg         
    0   29889     C     0     0     0     0   ffmpeg

comment:2 follow-up: Changed 2 years ago by cehoyos

  • Component changed from ffmpeg to undetermined
  • Keywords cuvid h264 mpeg2 ts nvidia removed

Does ffmpeg crash or hang?

Please provide a backtrace.

comment:3 Changed 2 years ago by slomkowski

It hangs. I see ffmpeg hang just now. It's running under my own program. I include the log (using FFREPORT env).

I just ran the same command in console (the already hanged instance is still in the background):

/home/rtv/soft/ffmpeg/ffmpeg/ffmpeg -y -loglevel warning -hwaccel_device 0 -hwaccel cuvid -c:v mpeg2_cuvid -i /mnt/rtvprocessing/artifacts/2017-01-05/RV_N24____2017-01-05_09.14__vsn0/raw-ts-file.ts -c:v h264_nvenc -rc vbr -movflags +faststart video-file.mp4

and it hangs my SSH connection (Ctrl+C, Ctrl+Z doesn't work).

comment:4 Changed 2 years ago by cehoyos

  • Keywords crash removed

The logfile you attached does not indicate a hang.
In any case please provide a backtrace.

comment:5 follow-up: Changed 2 years ago by slomkowski

Since ffmpeg doesn't crash, just hangs or waits for something, I cannot provide backtrace. I just run ffmpeg once again with debug level, and this my console output (note that other ffmpeg instance is still hanged in background).

ffmpeg version N-82987-gd800d48 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 4.9.2 (Debian 4.9.2-10)
  configuration: --prefix=../ffmpeg_build --enable-gpl --disable-debug --enable-libass --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libx264 --enable-nonfree --enable-nvenc --enable-libnpp --enable-vdpau --enable-cuda --enable-cuvid
  libavutil      55. 43.100 / 55. 43.100
  libavcodec     57. 70.101 / 57. 70.101
  libavformat    57. 61.100 / 57. 61.100
  libavdevice    57.  2.100 / 57.  2.100
  libavfilter     6. 68.100 /  6. 68.100
  libswscale      4.  3.101 /  4.  3.101
  libswresample   2.  4.100 /  2.  4.100
  libpostproc    54.  2.100 / 54.  2.100
Splitting the commandline.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-hwaccel_device' ... matched as option 'hwaccel_device' (select a device for HW acceleration) with argument '0'.
Reading option '-hwaccel' ... matched as option 'hwaccel' (use HW accelerated decoding) with argument 'cuvid'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'mpeg2_cuvid'.
Reading option '-i' ... matched as input url with argument '/mnt/rtvprocessing/artifacts/2017-01-05/RV_N24____2017-01-05_09.14__vsn0/raw-ts-file.ts'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'h264_nvenc'.
Reading option '-rc' ... matched as AVOption 'rc' with argument 'vbr'.
Reading option '-movflags' ... matched as AVOption 'movflags' with argument '+faststart'.
Reading option 'video-file.mp4' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option y (overwrite output files) with argument 1.
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input url /mnt/rtvprocessing/artifacts/2017-01-05/RV_N24____2017-01-05_09.14__vsn0/raw-ts-file.ts.
Applying option hwaccel_device (select a device for HW acceleration) with argument 0.
Applying option hwaccel (use HW accelerated decoding) with argument cuvid.
Applying option c:v (codec name) with argument mpeg2_cuvid.
Successfully parsed a group of options.
Opening an input file: /mnt/rtvprocessing/artifacts/2017-01-05/RV_N24____2017-01-05_09.14__vsn0/raw-ts-file.ts.
[file @ 0x21d5060] Setting default whitelist 'file,crypto'
[mpegts @ 0x21d4760] Format mpegts probed with size=2048 and score=50
[mpegts @ 0x21d4760] stream=0 stream_type=2 pid=3ff prog_reg_desc=
[mpegts @ 0x21d4760] stream=1 stream_type=3 pid=400 prog_reg_desc=
[mpegts @ 0x21d4760] stream=2 stream_type=6 pid=23 prog_reg_desc=
[mpegts @ 0x21d4760] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:3
[mpegts @ 0x21d4760] parser not found for codec dvb_teletext, packets or times may be invalid.
    Last message repeated 1 times
[mpeg2video @ 0x21d9220] Invalid frame dimensions 0x0.
    Last message repeated 10 times
[mpegts @ 0x21d4760] max_analyze_duration 5000000 reached at 5000000 microseconds st:2
[mpegts @ 0x21d4760] PES packet size mismatch
[mpegts @ 0x21d4760] After avformat_find_stream_info() pos: 0 bytes read:3330192 seeks:2 frames:449
Input #0, mpegts, from '/mnt/rtvprocessing/artifacts/2017-01-05/RV_N24____2017-01-05_09.14__vsn0/raw-ts-file.ts':
  Duration: 00:18:00.88, start: 54479.140411, bitrate: 3785 kb/s
  Program 17503 
    Stream #0:0[0x3ff], 118, 1/90000: Video: mpeg2video (Main), 1 reference frame ([2][0][0][0] / 0x0002), yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 0/1, 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x400](deu), 204, 1/90000: Audio: mp2 ([3][0][0][0] / 0x0003), 48000 Hz, stereo, s16p, 192 kb/s (clean effects)
    Stream #0:2[0x23](deu), 127, 1/90000: Subtitle: dvb_teletext ([6][0][0][0] / 0x0006)
Successfully opened the file.
Parsing a group of options: output url video-file.mp4.
Applying option c:v (codec name) with argument h264_nvenc.
Successfully parsed a group of options.
Opening an output file: video-file.mp4.
[file @ 0x22049a0] Setting default whitelist 'file,crypto'
Successfully opened the file.
Setting up CUVID transcoding

Just waits on Setting up CUVID transcoding. Also all ffmpeg instances consume 100% CPU.

comment:6 in reply to: ↑ 5 Changed 2 years ago by cehoyos

Replying to slomkowski:

Since ffmpeg doesn't crash, just hangs or waits for something, I cannot provide backtrace.

Why not?

Just waits on Setting up CUVID transcoding. Also all ffmpeg instances consume 100% CPU.

Please provide a backtrace when this happens.

comment:7 Changed 2 years ago by slomkowski

I tried to attach to the processes using gdb:

(gdb) attach 32749
Attaching to process 32749

and gdb just hangs here (you cannot interrupt it using Ctrl+C). Same for other ffmpeg instance.

comment:8 Changed 2 years ago by cehoyos

Then add breakpoints (or printfs) to find out where it hangs.

comment:9 Changed 2 years ago by slomkowski

I used printf and determined ffmpeg (the new instance) hangs on function:

err = cu->cuCtxCreate(&hwctx->cuda_ctx, CU_CTX_SCHED_BLOCKING_SYNC, cu_device);

in file libavutil/hwcontext_cuda.c, line 352. I can't tell anything about original hang.

comment:10 Changed 2 years ago by cehoyos

What happens if you replace CU_CTX_SCHED_BLOCKING_SYNC with 0?

comment:11 Changed 2 years ago by slomkowski

err = cu->cuCtxCreate(&hwctx->cuda_ctx, 0, cu_device);

doesn't change anything.

comment:12 in reply to: ↑ 2 ; follow-up: Changed 2 years ago by slomkowski

I just managed to create TS file sample, which hangs ffmpeg on first call. I include the file and the log (note: lines with single numbers in log are from my debug printfs).

nvidia-smi dmon log:

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0    23    38     0     0     0     0  3304  1139
    0    25    39     0     0     0     0  3304  1139
    0    25    39     0     0     0     0  3304  1139
    0    25    39     0     0     0     0  3304  1139
    0    27    39     3     4     4     8  3304  1139
    0    27    39     0     0    10    22  3304  1139
    0    35    40     0     0     7    46  3304  1404
    0    35    40     0     0     5    62  3304  1404
    0    35    41     0     0     3    74  3304  1404
    0    35    41     0     0     2    82  3304  1404
    0    35    41     0     0     2    87  3304  1404
    0    35    41     0     0     1    91  3304  1404
    0    35    41     0     0     1    94  3304  1404
    0    35    41     0     0     0    96  3304  1404
    0    35    41     0     0     0    97  3304  1404
    0    35    42     0     0     0    98  3304  1404
    0    35    42     0     0     0    99  3304  1404
    0    35    42     0     0     0    99  3304  1404
    0    35    42     0     0     0    99  3304  1404
    0    35    42     0     0     0    99  3304  1404
    0    35    42     0     0     0   100  3304  1404
    0    35    42     0     0     0   100  3304  1404
    0    35    42     0     0     0   100  3304  1404
    0    35    42     0     0     0   100  3304  1404

nvidia-smi pmon log:

# gpu     pid  type    sm   mem   enc   dec   command
# Idx       #   C/G     %     %     %     %   name
    0    1781     C     0     0     0    99   ffmpeg         
    0    1781     C     0     0     0    99   ffmpeg

Replying to cehoyos:

Please provide a backtrace.

This time, I managed to attach ffmpeg to gdb. This is backtrace:

(gdb) attach 1731
(gdb) bt
#0  0x00007f18fd106aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f18f3bf7367 in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#2  0x00007f18f3bf9ae9 in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#3  0x00007f18f3be631e in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#4  0x00007f18f3beb5a7 in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#5  0x00007f18f3be97d4 in cuvidDestroyDecoder () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#6  0x000000000042496f in cuvid_decode_end ()
#7  0x000000000044b5dc in avcodec_close ()
#8  0x00000000004a2e56 in transcode ()
#9  0x0000000000483125 in main ()

Changed 2 years ago by slomkowski

log of processing 8MB file, which hangs at cuvidDestroyDecoder()

comment:13 Changed 2 years ago by slomkowski

Since upload.ffmpeg.org seems to be down, I uploaded the TS sample here:
https://services.breitenbachmedia.com/ffmpeg/6058-hangs-ffmpeg-dec100-1.ts

comment:14 in reply to: ↑ 12 ; follow-up: Changed 2 years ago by cehoyos

Replying to slomkowski:

#5  0x00007f18f3be97d4 in cuvidDestroyDecoder () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

Do you agree that libnvcuvid.so.1 must never hang in cuvidDestroyDecoder(), no matter what we pass as arguments?

comment:15 in reply to: ↑ 14 Changed 2 years ago by slomkowski

Replying to cehoyos:

Do you agree that libnvcuvid.so.1 must never hang in cuvidDestroyDecoder(), no matter what we pass as arguments?

I don't understand.

comment:16 Changed 2 years ago by thunder.m

Thank YOU for finding this bug (epsecialy for testing data!), we had alotof crashes on our systems while decoding with cuvid and MPEG2!

I can reproduce this behavior on Nvidia GTX 960 (maxwell second gen GM206), but it works without any problems on GTX 1060, 1070 and 1080 (we have alotof nvidia cards).

What is very strange is that i can't reproduce this on Quadro M4000, which is also Maxwell but GM204. So my first opinion is that affected are only GM206 cards.

Could anyone test this sample on GTX970/980 with GM204?

Solution could be for now to switch to GTX 1050 which has also 2x faster encoder :)

By the way i think it could be in CUDA or in NVDEC engine so driver fix from NVIDIA is needed. It is not bug in ffmpeg itself.

comment:17 Changed 2 years ago by cehoyos

Did you already report this issue to Nvidia?

comment:18 Changed 2 years ago by slomkowski

comment:19 Changed 2 years ago by cehoyos

  • Resolution set to invalid
  • Status changed from new to closed

Closing this issue since it is either a driver or a hardware issue. Please feel free to add information about possible fixes here.

Note: See TracTickets for help on using tickets.