#9082 closed defect (fixed)
Memory leak while trancoding on GPU
Reported by: | misko | Owned by: | |
---|---|---|---|
Priority: | important | Component: | avcodec |
Version: | git-master | Keywords: | cuda regression |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by )
Summary of the bug:
While transcoding live SPTS with H.264/AVC video to multiple H.264/AVC output profiles and MPEG1 Layer 2 audio to AAC using nVidia GPU (Pascal) NVENC, all ffmpeg versions since 022a12b306ab2096e6ac9fc9b149828a849d65b2 are causing memory leak. Note: during a brief testing with single profile the leaking wasn't detected. But with at least 2 profiles being output simultaneously, memory leaking definitely occurs.
How to reproduce:
/opt/ffmpeg/bin/ffmpeg -hide_banner -nostats -loglevel verbose -vsync -1 -hwaccel cuvid -hwaccel_output_format cuda -hwaccel_device 1 -init_hw_device cuda=cuda:1 -filter_hw_device cuda -threads 1 -fflags +discardcorrupt -c:v h264_cuvid -f mpegts -i 'udp://239.232.9.1:1234?sources=10.128.0.9&fifo_size=100000&timeout=2000000' -filter_complex '[0:v:0]yadif_cuda=mode=send_frame:parity=auto:deint=all,split=3[s_p1][s_p2][s_p3];[s_p1]scale_npp=-1:1080:interp_algo=lanczos[v_p1];[s_p2]scale_npp=-1:720:interp_algo=lanczos[v_p2];[s_p3]scale_npp=-1:576:interp_algo=lanczos[v_p3]' -map '[v_p1]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v high -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 5000000 -maxrate:v 5000000 -bufsize:v 2500000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10001?pkt_size=1316&fifo_size=100000' -map '[v_p2]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v high -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 2500000 -maxrate:v 2500000 -bufsize:v 1250000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10002?pkt_size=1316&fifo_size=100000' -map '[v_p3]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v main -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 1250000 -maxrate:v 1250000 -bufsize:v 625000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10003?pkt_size=1316&fifo_size=100000'
Attachments (3)
Change History (26)
comment:1 by , 4 years ago
Priority: | normal → critical |
---|
follow-up: 3 comment:2 by , 4 years ago
Priority: | critical → normal |
---|
comment:3 by , 4 years ago
Replying to cehoyos:
Please provide valgrind output for all memory leak reports, please always provide the command line you tested together with the complete, uncut console output and please do not use hide_banner when reporting issues.
I'm afraid I can't provide any valgrind output unfortunately because running ffmpeg under valgrind makes it so slow that it can't handle all that processing in time and I get lots of various errors that don't occur in normal situation.
I'll attach full output and the whole command line.
follow-ups: 7 9 comment:5 by , 4 years ago
Replying to mkver:
Can you test with ASAN? It's way faster than Valgrind.
I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?
comment:6 by , 4 years ago
Well ... with toolchain=gcc-asan enabled it doesn't work at all. I get this weird CUDA error:
ffmpeg version N-99167-g022a12b306 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 8 (Debian 8.3.0-6) configuration: --prefix=/opt/ffmpeg --enable-cuvid --enable-nvenc --enable-libfdk-aac --enable-nonfree --enable-gpl --enable-libnpp --enable-cuda --enable-cuda-nvcc --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --nvccflags='-gencode arch=compute_60,code=sm_60 -O2' --toolchain=gcc-asan libavutil 56. 59.100 / 56. 59.100 libavcodec 58.106.100 / 58.106.100 libavformat 58. 56.100 / 58. 56.100 libavdevice 58. 11.102 / 58. 11.102 libavfilter 7. 87.100 / 7. 87.100 libswscale 5. 8.100 / 5. 8.100 libswresample 3. 8.100 / 3. 8.100 libpostproc 55. 8.100 / 55. 8.100 [AVHWDeviceContext @ 0x6090000004c0] cu->cuInit(0) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory Device creation failed: -1313558101. Failed to set value 'cuda=cuda:1' for option 'init_hw_device': Unknown error occurred Error parsing global options: Unknown error occurred ================================================================= ==8292==ERROR: LeakSanitizer: detected memory leaks Direct leak of 65536 byte(s) in 1 object(s) allocated from: #0 0x7fa5b4afb720 in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9720) #1 0x7fa5acc9ecff (/opt/ffmpeg-asan/ffmpeg_g+0xd0f0cff) Direct leak of 264 byte(s) in 1 object(s) allocated from: #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518) #1 0x7fa5ace058ba (/opt/ffmpeg-asan/ffmpeg_g+0xd2578ba) Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518) #1 0x7fa5acc9f12f (/opt/ffmpeg-asan/ffmpeg_g+0xd0f112f) Direct leak of 64 byte(s) in 2 object(s) allocated from: #0 0x7fa5b4afb330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7fa5acc8b081 (/opt/ffmpeg-asan/ffmpeg_g+0xd0dd081) Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518) #1 0x7fa5acc9f0bf (/opt/ffmpeg-asan/ffmpeg_g+0xd0f10bf) Indirect leak of 640 byte(s) in 2 object(s) allocated from: #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518) #1 0x7fa5acc8b09c (/opt/ffmpeg-asan/ffmpeg_g+0xd0dd09c) SUMMARY: AddressSanitizer: 66640 byte(s) leaked in 8 allocation(s).
I don't know if I'm doing this the proper way.
follow-up: 8 comment:7 by , 4 years ago
comment:8 by , 4 years ago
Replying to Balling:
Replying to misko:
Replying to mkver:
Can you test with ASAN? It's way faster than Valgrind.
I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?
No, you need to use -fsanitize=address compiler flags (both CFLAGS and CXXFLAGS) and linker flags (LDFLAGS).
In configure script I found this case statement:
case "$toolchain" in *-asan) cc_default="${toolchain%-asan}" add_cflags -fsanitize=address add_ldflags -fsanitize=address ;;
So specifying --toolchain=gcc-asan does exactly what you said. It adds -fsanitize=address to both CFLAGS and LDFLAGS.
comment:9 by , 4 years ago
comment:10 by , 4 years ago
ASAN_OPTIONS=protect_shadow_gap=0
environment variable is needed for program to run under ASAN with CUDA. And yes I used ffmpeg_g
instead of ffmpeg
. I'll post results later.
comment:11 by , 4 years ago
ASAN revealed nothing at all. It looks like some internal buffer growing over time being deallocated as a whole when ffmpeg quits. This may the reason why ASAN sees no leaks at exit. Here is the tail of the output while running under ASAN:
frame=69503 fps= 25 q=33.0 Lq=33.0 q=36.0 size= 1776788kB time=00:46:22.12 bitrate=5231.8kbits/s speed= 1x video:2953609kB audio:97801kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (udp://239.232.122.8:1234?sources=10.128.0.122&fifo_size=100000&timeout=2000000): Input stream #0:0 (video): 139091 packets read (1384717898 bytes); 69505 frames decoded; Input stream #0:1 (audio): 115908 packets read (55635840 bytes); 115908 frames decoded (133526016 samples); Input stream #0:2 (audio): 204 packets read (313344 bytes); Input stream #0:3 ((null)): 0 packets read (0 bytes); Input stream #0:4 (subtitle): 0 packets read (0 bytes); Input stream #0:5 ((null)): 0 packets read (0 bytes); Input stream #0:6 (data): 32 packets read (49057 bytes); Total: 255235 packets (1440716139 bytes) demuxed Output file #0 (udp://239.232.229.254:10001?pkt_size=1316&fifo_size=100000): Output stream #0:0 (video): 69503 frames encoded; 69503 packets muxed (1728018257 bytes); Output stream #0:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); Total: 199901 packets (1761400942 bytes) muxed Output file #1 (udp://239.232.229.254:10002?pkt_size=1316&fifo_size=100000): Output stream #1:0 (video): 69503 frames encoded; 69503 packets muxed (864014249 bytes); Output stream #1:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); Total: 199901 packets (897396934 bytes) muxed Output file #2 (udp://239.232.229.254:10003?pkt_size=1316&fifo_size=100000): Output stream #2:0 (video): 69503 frames encoded; 69503 packets muxed (432462723 bytes); Output stream #2:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); Total: 199901 packets (465845408 bytes) muxed [AVIOContext @ 0x613000004800] Statistics: 0 seeks, 1421075 writeouts [AVIOContext @ 0x6130000049c0] Statistics: 0 seeks, 750353 writeouts [AVIOContext @ 0x613000004b80] Statistics: 0 seeks, 415249 writeouts [h264_nvenc @ 0x619000076c80] Nvenc unloaded [h264_nvenc @ 0x619000078a80] Nvenc unloaded [h264_nvenc @ 0x61900007a880] Nvenc unloaded [AVIOContext @ 0x613000003840] Statistics: 1678790932 bytes read, 0 seeks ================================================================= ==17126==ERROR: LeakSanitizer: detected memory leaks Direct leak of 20856 byte(s) in 3 object(s) allocated from: #0 0x7fe137ead330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7fe106f6551a (<unknown module>) Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fe137ead330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7fe0e0bdaf3d (<unknown module>) SUMMARY: AddressSanitizer: 20888 byte(s) leaked in 4 allocation(s).
I attached a plot of RSS over time. One is for commit 022a12b306 (the leaky one) and the other for commit 8a81820624 (the good one). Measurement went for straight 3 hours sampling RSS for particular process every 300 seconds.
comment:12 by , 4 years ago
Did another test with current master (a00ff56321). Problem still persists.
follow-up: 14 comment:13 by , 4 years ago
Keywords: | cuda added |
---|
Please run git bisect
to find the change introducing the regression you see.
comment:14 by , 4 years ago
Replying to cehoyos:
Please run
git bisect
to find the change introducing the regression you see.
I mentioned this in my first post. This issue was introduced in commit 022a12b306ab2096e6ac9fc9b149828a849d65b2.
comment:15 by , 4 years ago
Description: | modified (diff) |
---|---|
Keywords: | regression added |
Priority: | normal → important |
comment:18 by , 3 years ago
Replying to Balling:
Can you retest again?
Hi. Leak is still present in 8bcce5673a267ed371140bf3228ffb420ca2f69b.
comment:19 by , 3 years ago
Hi,
I was struggling with the exact same issue for last couple of days.
In my case, I have a batch process with 75 parallel transcodes with cuvid.
The memory is leaked by 1.6GB per hour for 75 transcoding process. I ran the test against the latest master branch, 4.4 and 4.3 releases. All of these have the same issue.
I am using the following command, similar to that mentioned in this thread
./ffmpeg_g -re -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -gpu 0 -deint adaptive -resize 1280x720 -drop_second_field 1 -i espnu.ts -filter_complex "[0:0]split=2[v0][v1];[v1]scale_npp=640:360[v2]" -map [v0] -map 0:1 -map [v2] -map 0:1 -c:v:0 h264_nvenc -b:v:0 2200k -gpu:v:0 0 -preset:v:0 hp -profile:v:0 high -level:v:0 4.1 -a53cc:v:0 1 -no-scenecut:v:0 1 -forced-idr:v:0 1 -strict_gop:v:0 1 -c:v:1 h264_nvenc -b:v:1 800k -gpu:v:1 0 -preset:v:1 hp -profile:v:1 high -level:v:1 3.1 -a53cc:v:1 1 -no-scenecut:v:1 1 -forced-idr:v:1 1 -strict_gop:v:1 1 -c:a libfdk_aac -b:a 128k -ac 2 -var_stream_map "v:0,a:0 v:1,a:1" -f hls -hls_time 4 -hls_list_size 4 -hls_flags delete_segments -hls_delete_threshold 2 -forced-idr 1 -force_key_frames "expr:gte(t,n_forced*2)" -sc_threshold 0 -hls_segment_filename "stream/v%v/fileSequence%d.ts" -master_pl_name playlist.m3u8 stream/v%v/prog_index.m3u8
I played with valgrind as well to see where the memory is leaking, but couldn't get any useful information.
I even tried without libnpp filters, and used output to go to /dev/null to isolate the leak. After these experiments, I saw this thread and see the commit 022a12b306 introduced the issue. So I created a new branch just before this commit 022a12b306 and this time there is no leak.
Let me know if there is any patches available for testing this. I will be happy to do the testing as I would like to make the transcoder upto date with latest ffmpeg release.
follow-up: 21 comment:20 by , 3 years ago
Can you test the following?
diff --git a/libavcodec/decode.c b/libavcodec/decode.c index 75bc7ad98e..ed49d14fab 100644 --- a/libavcodec/decode.c +++ b/libavcodec/decode.c @@ -533,6 +533,11 @@ static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame) if (ret == AVERROR_EOF) avci->draining_done = 1; + if (IS_EMPTY(avci->last_pkt_props) + && av_fifo_size(avctx->internal->pkt_props) >= sizeof(*avci->last_pkt_props)) + av_fifo_generic_read(avctx->internal->pkt_props, + avci->last_pkt_props, sizeof(*avci->last_pkt_props), NULL); + if (!ret) { frame->best_effort_timestamp = guess_correct_pts(avctx, frame->pts, @@ -1490,10 +1495,6 @@ int ff_decode_frame_props(AVCodecContext *avctx, AVFrame *frame) { AV_PKT_DATA_S12M_TIMECODE, AV_FRAME_DATA_S12M_TIMECODE }, }; - if (IS_EMPTY(pkt) && av_fifo_size(avctx->internal->pkt_props) >= sizeof(*pkt)) - av_fifo_generic_read(avctx->internal->pkt_props, - pkt, sizeof(*pkt), NULL); - frame->pts = pkt->pts; frame->pkt_pos = pkt->pos; frame->pkt_duration = pkt->duration;
comment:21 by , 3 years ago
Replying to James:
Can you test the following?
I applied the patch to the current master (fcb80aa289a5339353ca9b1f5b2591d0e6cc5f19). Unfortunately this doesn't fix the issue. I have 19 encoding processes with 17 of them leaking cca 30-40MiB per hour each. Two processes don't leak at all. The only difference between them is that those 2 use rtp as source. The remaining 17 use udp.
follow-up: 23 comment:22 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
Should be fixed in 6b4805686c9991fbb474e9f3488b76a91bf4cd22. Will backport to 4.4 latter.
4.3 does not have 022a12b306ab2096e6ac9fc9b149828a849d65b2 so it's not affected.
comment:23 by , 3 years ago
Replying to James:
Should be fixed in 6b4805686c9991fbb474e9f3488b76a91bf4cd22. Will backport to 4.4 latter.
4.3 does not have 022a12b306ab2096e6ac9fc9b149828a849d65b2 so it's not affected.
Tested with ec8e95296ec069ddf29f479b62accb49ac18e8a8 which goes after 6b4805686c9991fbb474e9f3488b76a91bf4cd22. No leaks anymore.
Thanks.
Please provide valgrind output for all memory leak reports, please always provide the command line you tested together with the complete, uncut console output and please do not use hide_banner when reporting issues.