#7674 closed defect (fixed)
ffmpeg with cuvid transcoding after version 3.4.1 work unstable on heavy load CUDA card
Reported by: | Maxim | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | git-master | Keywords: | nvenc regression |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Hi!
For the transoding of media streams I use Nvidia Quadro P5000 video cards and ffmpeg software version 3.4.1 (OS Linux Ubuntu 16.04.5, kernel 4.15).
On version 3.4.1 all work is fine on this video card it was possible to generate 79 H264 streams:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P5000 Off | 00000000:01:00.0 Off | Off |
| 49% 76C P0 79W / 180W | 12792MiB / 16278MiB | 45% Default |
+-------------------------------+----------------------+----------------------+
# nvidia-smi -i 0 | grep ffmpeg | wc -l
79
---
Utilization
Gpu : 42 %
Memory : 16 %
Encoder : 42 %
Decoder : 92 %
When I try upgrade ffmpeg to version 3.4.2 or higer (4.X ot 4.X), ffmpeg was work unstable afte 40 streams.
Configuration ffmpeg:
ffmpeg -hwaccel cuvid -c:v mpeg2_cuvid -deint 2 -drop_second_field 1 -i udp://232.10.10.1:1234?fifo_size=300000 -b:v 2800k -b:a 192k -c:v h264_nvenc -profile:v high -preset hp -c:a aac -f flv rtmp://127.0.0.1:1935/live/001
What can be a issue ?
Change History (25)
comment:1 by , 6 years ago
Component: | ffmpeg → undetermined |
---|
comment:2 by , 6 years ago
If you believe there is a regression, run git bisect
to find the change introducing the issue.
follow-up: 5 comment:3 by , 6 years ago
Also, please be _a lot_ more specific than "ffmpeg was work unstable".
comment:4 by , 6 years ago
Summary: | ffmpeg with cuvid transcoding afte version 3.4.1 work with crash on heavy load CUDA card → ffmpeg with cuvid transcoding after version 3.4.1 work with crash on heavy load CUDA card |
---|
Patchs in 3.4.2 version for encoder:
https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/a7c60c5b7bc51289773a7e64ebeeeccb53943bdb
https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/d36714f727026ffcdf84c21c5498ceaef862ee75
https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/93c8720b914e7027d0e6401e6f64a9a4ce531d0c
https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/fbb27e2911839aaac7b460112eddfafe55b36d75
comment:5 by , 6 years ago
Replying to oromit:
Also, please be _a lot_ more specific than "ffmpeg was work unstable".
Unstable work looks like this, if up to 40 ffmpeg is running at the same time, then everything is OK, if I add for example 55 streams, the nvidia-smi utility slowly shows statistics, and the ffmpeg processes randomly exit after some time.
comment:6 by , 6 years ago
Resolution: | → needs_more_info |
---|---|
Status: | new → closed |
Please reopen this ticket if you can confirm the issue is reproducible with current FFmpeg git head, if you can point us to the commit introducing the regression and if you can provide simplified command line including the complete, uncut console output.
comment:7 by , 6 years ago
Hi!
I was check issue on git-master version, this is issue also have.
Than I was removed patchs (revert) from version 4.1, now my issue is resolve :)
I think this is patch need analyzing in case, when used cuvid transcoding:
932037c6bb6b41a24e75b031426844a2e6472a74
48e52e4edd12adbc36eee0eebe1b97ffe0255be3
32bc4e77f61a5483c83a360b9ccbfc2840daba1e
bbe1b21022e4872bc64066d46a4567dc1b655f7a
comment:8 by , 6 years ago
Resolution: | needs_more_info |
---|---|
Status: | closed → reopened |
comment:9 by , 6 years ago
To make this a valid bug report please provide the command line you tested together with the complete, uncut console output and point us to the change (it is one commit, not four) that introduced the regression.
comment:10 by , 6 years ago
My step-by-step:
git clone -b release/4.1 https://git.ffmpeg.org/ffmpeg.git ffmpeg41
dependent patches, for the revert "32bc4e77f61a5483c83a360b9ccbfc2840daba1e"
git revert 932037c6bb6b41a24e75b031426844a2e6472a74
git revert 48e52e4edd12adbc36eee0eebe1b97ffe0255be3
regression patch:
git revert 32bc4e77f61a5483c83a360b9ccbfc2840daba1e
test configuration
ffmpeg -hwaccel cuvid -c:v mpeg2_cuvid -deint 2 -drop_second_field 1 -i udp://232.10.10.1:1234?fifo_size=300000 -b:v 2800k -b:a 192k -c:v h264_nvenc -profile:v high -preset hp -c:a aac -f flv rtmp://127.0.0.1:1935/live/001
.
.
.
ffmpeg -hwaccel cuvid -c:v mpeg2_cuvid -deint 2 -drop_second_field 1 -i udp://232.10.10.60:1234?fifo_size=300000 -b:v 2800k -b:a 192k -c:v h264_nvenc -profile:v high -preset hp -c:a aac -f flv rtmp://127.0.0.1:1935/live/060
With issue patch, when 59 stream, starting next stream over 20 sec and on others stream freeze video.
comment:11 by , 6 years ago
Resolution: | → needs_more_info |
---|---|
Status: | reopened → closed |
comment:12 by , 6 years ago
Summary: | ffmpeg with cuvid transcoding after version 3.4.1 work with crash on heavy load CUDA card → ffmpeg with cuvid transcoding after version 3.4.1 work unstable on heavy load CUDA card |
---|
follow-up: 23 comment:13 by , 6 years ago
The patch you claim introduced the regression fixes a leak and potential crash. It seems very unlikely to me that it would introduce performance issues.
comment:14 by , 6 years ago
May be when using cuvid transcoding video frames are processed inside the GPU, it is possible that this was not provided when developing the patch.
comment:15 by , 6 years ago
That patch is specifically only for the case of pure on-GPU transcoding. It sits in the path that registers a CUDA frame to nvenv.
comment:16 by , 6 years ago
That's right, but when added two strings:
+ p_nvenc->nvEncUnregisterResource(ctx->nvencoder, ctx->registered_frames[tmpoutsurf->reg_idx].regptr);
+ ctx->registered_frames[tmpoutsurf->reg_idx].regptr = NULL;
GPU adapter begin to work slowly when load encoder to 80-90% :(
comment:17 by , 6 years ago
Resolution: | needs_more_info |
---|---|
Status: | closed → reopened |
I confirm the issue reported in this ticket with current git. Running multiple transcoding instances (above 50) starts to have a big performance degredation.
With the following patch on current git
--- ffmpeg/libavcodec/nvenc.c 2019-04-08 20:53:19.745925070 +0300 +++ ffmpeg/libavcodec/nvenc.c 2019-04-08 20:55:51.619074973 +0300 @@ -1846,13 +1846,6 @@ res = nvenc_print_error(avctx, nv_status, "Failed unmapping input resource"); goto error; } - nv_status = p_nvenc->nvEncUnregisterResource(ctx->nvencoder, ctx->registered_frames[tmpoutsurf->reg_idx].regptr); - if (nv_status != NV_ENC_SUCCESS) { - res = nvenc_print_error(avctx, nv_status, "Failed unregistering input resource"); - goto error; - } - ctx->registered_frames[tmpoutsurf->reg_idx].ptr = NULL; - ctx->registered_frames[tmpoutsurf->reg_idx].regptr = NULL; } else if (ctx->registered_frames[tmpoutsurf->reg_idx].mapped < 0) { res = AVERROR_BUG; goto error;
the issue is resolved. I am not in a position to understand why this patch fixes the issue, but it does. And I see no other problem (mem leak or something else) when applying above patch.
The issue affects nvenc in general, it is not related to mpeg-2 input that original author reported.
comment:18 by , 6 years ago
Please either:
Send your patch - made with git format-patch
- to the FFmpeg development mailing list, patches are ignored on this bug tracker.
Or provide the command line you tested together with the complete, uncut console output to make this a valid ticket.
comment:19 by , 6 years ago
I am sorry but the output is irrelevant. To test the issue you need to have at least two Quadro P5000 or two Quadro RTX 5000 on same computer, in order to be able to run that many multiple instances.
Sample bash script to test the issue:
#!/bin/bash for j in `seq 0 $1` ; do for i in `seq 1 $2` ; do ffmpeg-git -nostdin -loglevel error -stats \ -hwaccel cuvid -hwaccel_device $j -c:v h264_cuvid -surfaces 12 \ -i input_1080i.ts \ -vf yadif_cuda=1:-1:1,scale_npp=w=1280:h=720 \ -c:v h264_nvenc \ -preset fast \ -acodec copy -f mpegts -y /dev/null & done done wait echo done
Sample input file can be downloaded from http://207.154.237.57/files/input_1080i.ts
It is a 1080i input and we do deinterlacing. That way we can push much more frames on nvenc, because if for example we use a 50fps or 60fps input, nvdec will limit us first.
You run it as: ./testbench.sh 1 30 where 1 is the number of GPUs-1 (if you have 3, you put 2 etc) and 30 is the number of concurrent sessions per GPU.
Increasing the sessions above 25-30 per GPU will show the issue immediately. Applying above patch resolves the issue.
comment:20 by , 6 years ago
Another test case has been reported at https://devtalk.nvidia.com/default/topic/1050306/video-codec-and-optical-flow-sdk/video-transcoding-using-multiple-gpus-32-live-streaming-jobs-/ with the same behaviour. Applying above patch also fixes that test case. It has been confirmed now from at least 3 different test cases, I don't understand why you don't revert the problematic code change.
comment:21 by , 6 years ago
I have put some debug info to check what is going on when this code is executed. For every processed frame, code calls nvEncUnregisterResource. For my test example of 2984 frames, code is executed 2984 times.
/testnvidia4.sh 0 1 [h264_nvenc @ 0x562ba7c98e80] DEBUG: Unmapped, need(?) to unregister Last message repeated 2983 times13568kB time=00:00:47.01 bitrate=2364.1kbits/s speed=31.3x frame= 2984 fps=1542 q=27.0 Lsize= 18214kB time=00:01:01.56 bitrate=2423.8kbits/s speed=31.8x
Obviously, when running multiple encodes, doing thousands calls of nvEncUnregisterResource creates the performance issue.
comment:22 by , 6 years ago
With the above patch applied, nvEncRegisterResource is called 5 times at start and nvEncUnregisterResource is called 5 times at end of process. Without it (current git code) nvEncRegisterResource is called 2984 times and nvEncUnregisterResource is also called 2984 times (for the test input file of 2984 frames). So not calling nvEncUnregisterResource at this specific code location does not leave any garbage since nvEncRegisterResource and nvEncUnregisterResource are matching, it just creates really unnecessary overhead registering and unregistering on every frame. Please fix.
comment:23 by , 6 years ago
Replying to oromit:
The patch you claim introduced the regression fixes a leak and potential crash. It seems very unlikely to me that it would introduce performance issues.
As I showed above, it does not. There is no leak. It just creates really unnecessary overhead registering and unregistering on every processed frame.
comment:24 by , 6 years ago
Commit 23ed147e8fc2b6b51a88af66b40f99049e5fa0d8 fixes the issue. Thank you very much - although I feel a bit disappointed about the wording used in the commit description. This was not a "super rare edge case", it was a typical workload with many transcodes on multiple gpus. Original commit (which is now reverted) mentioned for a "blew up" which was never experienced. Original commit only made the code keep unregistering and registering on every processes frame. So I would expect a "sorry, we screwed up the code originally, so we are reverting now" and not trying to tell that this was a "super rare edge case".
comment:25 by , 6 years ago
Component: | undetermined → avcodec |
---|---|
Keywords: | nvenc regression added |
Resolution: | → fixed |
Status: | reopened → closed |
Version: | unspecified → git-master |
Please test current FFmpeg git head and provide a simplified command line (if possible without network input or output) including the complete, uncut console output to make this a valid ticket. If you see a crash, please provide backtrace, disassembly and register dump as explained on https://ffmpeg.org/bugreports.html