Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#9082 closed defect (fixed)

Memory leak while trancoding on GPU

Reported by: misko Owned by:
Priority: important Component: avcodec
Version: git-master Keywords: cuda regression
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by Carl Eugen Hoyos)

Summary of the bug:

While transcoding live SPTS with H.264/AVC video to multiple H.264/AVC output profiles and MPEG1 Layer 2 audio to AAC using nVidia GPU (Pascal) NVENC, all ffmpeg versions since 022a12b306ab2096e6ac9fc9b149828a849d65b2 are causing memory leak. Note: during a brief testing with single profile the leaking wasn't detected. But with at least 2 profiles being output simultaneously, memory leaking definitely occurs.

How to reproduce:

/opt/ffmpeg/bin/ffmpeg -hide_banner -nostats -loglevel verbose -vsync -1 -hwaccel cuvid -hwaccel_output_format cuda -hwaccel_device 1 -init_hw_device cuda=cuda:1 -filter_hw_device cuda -threads 1 -fflags +discardcorrupt -c:v h264_cuvid -f mpegts -i 'udp://239.232.9.1:1234?sources=10.128.0.9&fifo_size=100000&timeout=2000000' -filter_complex '[0:v:0]yadif_cuda=mode=send_frame:parity=auto:deint=all,split=3[s_p1][s_p2][s_p3];[s_p1]scale_npp=-1:1080:interp_algo=lanczos[v_p1];[s_p2]scale_npp=-1:720:interp_algo=lanczos[v_p2];[s_p3]scale_npp=-1:576:interp_algo=lanczos[v_p3]' -map '[v_p1]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v high -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 5000000 -maxrate:v 5000000 -bufsize:v 2500000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10001?pkt_size=1316&fifo_size=100000' -map '[v_p2]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v high -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 2500000 -maxrate:v 2500000 -bufsize:v 1250000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10002?pkt_size=1316&fifo_size=100000' -map '[v_p3]' -map '#0xc86' -c:v h264_nvenc -preset:v p4 -rc:v vbr -profile:v main -forced-idr:v 1 -force_key_frames:v 'expr:gte(t,n_forced*2)' -b:v 1250000 -maxrate:v 1250000 -bufsize:v 625000 -spatial-aq:v 1 -aq-strength:v 15 -bf:v 3 -b_ref_mode:v middle -no-scenecut:v 1 -rc-lookahead:v 32 -coder:v cabac -c:a:0 libfdk_aac -ac:a:0 2 -r:a:0 48000 -b:a:0 96k -f mpegts -mpegts_service_type advanced_codec_digital_hdtv -mpegts_flags system_b 'udp://239.232.229.61:10003?pkt_size=1316&fifo_size=100000'

Attachments (3)

ffmpeg_output.txt (12.1 KB ) - added by misko 3 years ago.
full output
ffmpeg_cmd.txt (2.0 KB ) - added by misko 3 years ago.
cmnd
plot.png (20.4 KB ) - added by misko 3 years ago.
RSS_plot

Download all attachments as: .zip

Change History (26)

comment:1 by misko, 3 years ago

Priority: normalcritical

comment:2 by Carl Eugen Hoyos, 3 years ago

Priority: criticalnormal

Please provide valgrind output for all memory leak reports, please always provide the command line you tested together with the complete, uncut console output and please do not use hide_banner when reporting issues.

in reply to:  2 comment:3 by misko, 3 years ago

Replying to cehoyos:

Please provide valgrind output for all memory leak reports, please always provide the command line you tested together with the complete, uncut console output and please do not use hide_banner when reporting issues.

I'm afraid I can't provide any valgrind output unfortunately because running ffmpeg under valgrind makes it so slow that it can't handle all that processing in time and I get lots of various errors that don't occur in normal situation.

I'll attach full output and the whole command line.

by misko, 3 years ago

Attachment: ffmpeg_output.txt added

full output

by misko, 3 years ago

Attachment: ffmpeg_cmd.txt added

cmnd

comment:4 by mkver, 3 years ago

Can you test with ASAN? It's way faster than Valgrind.

in reply to:  4 ; comment:5 by misko, 3 years ago

Replying to mkver:

Can you test with ASAN? It's way faster than Valgrind.

I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?

comment:6 by misko, 3 years ago

Well ... with toolchain=gcc-asan enabled it doesn't work at all. I get this weird CUDA error:

ffmpeg version N-99167-g022a12b306 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --prefix=/opt/ffmpeg --enable-cuvid --enable-nvenc --enable-libfdk-aac --enable-nonfree --enable-gpl --enable-libnpp --enable-cuda --enable-cuda-nvcc --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --nvccflags='-gencode arch=compute_60,code=sm_60 -O2' --toolchain=gcc-asan
  libavutil      56. 59.100 / 56. 59.100
  libavcodec     58.106.100 / 58.106.100
  libavformat    58. 56.100 / 58. 56.100
  libavdevice    58. 11.102 / 58. 11.102
  libavfilter     7. 87.100 /  7. 87.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
[AVHWDeviceContext @ 0x6090000004c0] cu->cuInit(0) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory
Device creation failed: -1313558101.
Failed to set value 'cuda=cuda:1' for option 'init_hw_device': Unknown error occurred
Error parsing global options: Unknown error occurred

=================================================================
==8292==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x7fa5b4afb720 in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9720)
    #1 0x7fa5acc9ecff  (/opt/ffmpeg-asan/ffmpeg_g+0xd0f0cff)

Direct leak of 264 byte(s) in 1 object(s) allocated from:
    #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518)
    #1 0x7fa5ace058ba  (/opt/ffmpeg-asan/ffmpeg_g+0xd2578ba)

Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518)
    #1 0x7fa5acc9f12f  (/opt/ffmpeg-asan/ffmpeg_g+0xd0f112f)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x7fa5b4afb330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330)
    #1 0x7fa5acc8b081  (/opt/ffmpeg-asan/ffmpeg_g+0xd0dd081)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518)
    #1 0x7fa5acc9f0bf  (/opt/ffmpeg-asan/ffmpeg_g+0xd0f10bf)

Indirect leak of 640 byte(s) in 2 object(s) allocated from:
    #0 0x7fa5b4afb518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518)
    #1 0x7fa5acc8b09c  (/opt/ffmpeg-asan/ffmpeg_g+0xd0dd09c)

SUMMARY: AddressSanitizer: 66640 byte(s) leaked in 8 allocation(s).

I don't know if I'm doing this the proper way.

in reply to:  5 ; comment:7 by Balling, 3 years ago

Replying to misko:

Replying to mkver:

Can you test with ASAN? It's way faster than Valgrind.

I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?

No, you need to use -fsanitize=address compiler flags (both CFLAGS and CXXFLAGS) and linker flags (LDFLAGS).

in reply to:  7 comment:8 by misko, 3 years ago

Replying to Balling:

Replying to misko:

Replying to mkver:

Can you test with ASAN? It's way faster than Valgrind.

I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?

No, you need to use -fsanitize=address compiler flags (both CFLAGS and CXXFLAGS) and linker flags (LDFLAGS).

In configure script I found this case statement:

case "$toolchain" in
    *-asan)
        cc_default="${toolchain%-asan}"
        add_cflags  -fsanitize=address
        add_ldflags -fsanitize=address
    ;;

So specifying --toolchain=gcc-asan does exactly what you said. It adds -fsanitize=address to both CFLAGS and LDFLAGS.

in reply to:  5 comment:9 by Carl Eugen Hoyos, 3 years ago

Replying to misko:

Replying to mkver:

Can you test with ASAN? It's way faster than Valgrind.

I can try. Is it enough just to enable --toolchain=gcc-asan configure flag?

Yes, it is (and it is the only supported way).
Please confirm that you used the ffmpeg_g binary for testing.

comment:10 by misko, 3 years ago

ASAN_OPTIONS=protect_shadow_gap=0 environment variable is needed for program to run under ASAN with CUDA. And yes I used ffmpeg_g instead of ffmpeg. I'll post results later.

comment:11 by misko, 3 years ago

ASAN revealed nothing at all. It looks like some internal buffer growing over time being deallocated as a whole when ffmpeg quits. This may the reason why ASAN sees no leaks at exit. Here is the tail of the output while running under ASAN:

frame=69503 fps= 25 q=33.0 Lq=33.0 q=36.0 size= 1776788kB time=00:46:22.12 bitrate=5231.8kbits/s speed=   1x    
video:2953609kB audio:97801kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Input file #0 (udp://239.232.122.8:1234?sources=10.128.0.122&fifo_size=100000&timeout=2000000):
  Input stream #0:0 (video): 139091 packets read (1384717898 bytes); 69505 frames decoded; 
  Input stream #0:1 (audio): 115908 packets read (55635840 bytes); 115908 frames decoded (133526016 samples); 
  Input stream #0:2 (audio): 204 packets read (313344 bytes); 
  Input stream #0:3 ((null)): 0 packets read (0 bytes); 
  Input stream #0:4 (subtitle): 0 packets read (0 bytes); 
  Input stream #0:5 ((null)): 0 packets read (0 bytes); 
  Input stream #0:6 (data): 32 packets read (49057 bytes); 
  Total: 255235 packets (1440716139 bytes) demuxed
Output file #0 (udp://239.232.229.254:10001?pkt_size=1316&fifo_size=100000):
  Output stream #0:0 (video): 69503 frames encoded; 69503 packets muxed (1728018257 bytes); 
  Output stream #0:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); 
  Total: 199901 packets (1761400942 bytes) muxed
Output file #1 (udp://239.232.229.254:10002?pkt_size=1316&fifo_size=100000):
  Output stream #1:0 (video): 69503 frames encoded; 69503 packets muxed (864014249 bytes); 
  Output stream #1:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); 
  Total: 199901 packets (897396934 bytes) muxed
Output file #2 (udp://239.232.229.254:10003?pkt_size=1316&fifo_size=100000):
  Output stream #2:0 (video): 69503 frames encoded; 69503 packets muxed (432462723 bytes); 
  Output stream #2:1 (audio): 130396 frames encoded (133525504 samples); 130398 packets muxed (33382685 bytes); 
  Total: 199901 packets (465845408 bytes) muxed
[AVIOContext @ 0x613000004800] Statistics: 0 seeks, 1421075 writeouts
[AVIOContext @ 0x6130000049c0] Statistics: 0 seeks, 750353 writeouts
[AVIOContext @ 0x613000004b80] Statistics: 0 seeks, 415249 writeouts
[h264_nvenc @ 0x619000076c80] Nvenc unloaded
[h264_nvenc @ 0x619000078a80] Nvenc unloaded
[h264_nvenc @ 0x61900007a880] Nvenc unloaded
[AVIOContext @ 0x613000003840] Statistics: 1678790932 bytes read, 0 seeks

=================================================================
==17126==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 20856 byte(s) in 3 object(s) allocated from:
    #0 0x7fe137ead330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330)
    #1 0x7fe106f6551a  (<unknown module>)

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fe137ead330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330)
    #1 0x7fe0e0bdaf3d  (<unknown module>)

SUMMARY: AddressSanitizer: 20888 byte(s) leaked in 4 allocation(s).

I attached a plot of RSS over time. One is for commit 022a12b306 (the leaky one) and the other for commit 8a81820624 (the good one). Measurement went for straight 3 hours sampling RSS for particular process every 300 seconds.

by misko, 3 years ago

Attachment: plot.png added

RSS_plot

comment:12 by misko, 3 years ago

Did another test with current master (a00ff56321). Problem still persists.

comment:13 by Carl Eugen Hoyos, 3 years ago

Keywords: cuda added

Please run git bisect to find the change introducing the regression you see.

in reply to:  13 comment:14 by misko, 3 years ago

Replying to cehoyos:

Please run git bisect to find the change introducing the regression you see.

I mentioned this in my first post. This issue was introduced in commit 022a12b306ab2096e6ac9fc9b149828a849d65b2.

Last edited 3 years ago by Carl Eugen Hoyos (previous) (diff)

comment:15 by Carl Eugen Hoyos, 3 years ago

Description: modified (diff)
Keywords: regression added
Priority: normalimportant

comment:16 by misko, 3 years ago

Hi,

is there any news regarding this issue?

comment:17 by Balling, 3 years ago

Status: newopen

Can you retest again?

in reply to:  17 comment:18 by misko, 3 years ago

Replying to Balling:

Can you retest again?

Hi. Leak is still present in 8bcce5673a267ed371140bf3228ffb420ca2f69b.

comment:19 by Dhanish Vijayan, 3 years ago

Hi,
I was struggling with the exact same issue for last couple of days.
In my case, I have a batch process with 75 parallel transcodes with cuvid.
The memory is leaked by 1.6GB per hour for 75 transcoding process. I ran the test against the latest master branch, 4.4 and 4.3 releases. All of these have the same issue.

I am using the following command, similar to that mentioned in this thread

./ffmpeg_g -re -hwaccel cuvid -hwaccel_output_format cuda  -c:v h264_cuvid  -gpu 0 -deint adaptive -resize 1280x720 -drop_second_field 1 -i espnu.ts -filter_complex "[0:0]split=2[v0][v1];[v1]scale_npp=640:360[v2]" -map [v0] -map 0:1 -map [v2] -map 0:1 -c:v:0 h264_nvenc -b:v:0  2200k -gpu:v:0 0 -preset:v:0 hp -profile:v:0 high -level:v:0 4.1 -a53cc:v:0 1 -no-scenecut:v:0 1 -forced-idr:v:0 1 -strict_gop:v:0 1 -c:v:1 h264_nvenc -b:v:1  800k  -gpu:v:1 0 -preset:v:1 hp -profile:v:1 high -level:v:1 3.1 -a53cc:v:1 1 -no-scenecut:v:1 1 -forced-idr:v:1 1 -strict_gop:v:1 1 -c:a  libfdk_aac -b:a 128k -ac 2   -var_stream_map "v:0,a:0 v:1,a:1"  -f hls -hls_time 4 -hls_list_size 4 -hls_flags delete_segments -hls_delete_threshold 2  -forced-idr 1 -force_key_frames "expr:gte(t,n_forced*2)" -sc_threshold 0 -hls_segment_filename "stream/v%v/fileSequence%d.ts"  -master_pl_name playlist.m3u8   stream/v%v/prog_index.m3u8

I played with valgrind as well to see where the memory is leaking, but couldn't get any useful information.

I even tried without libnpp filters, and used output to go to /dev/null to isolate the leak. After these experiments, I saw this thread and see the commit 022a12b306 introduced the issue. So I created a new branch just before this commit 022a12b306 and this time there is no leak.

Let me know if there is any patches available for testing this. I will be happy to do the testing as I would like to make the transcoder upto date with latest ffmpeg release.

Last edited 3 years ago by Dhanish Vijayan (previous) (diff)

comment:20 by James, 3 years ago

Can you test the following?

diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 75bc7ad98e..ed49d14fab 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -533,6 +533,11 @@ static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
     if (ret == AVERROR_EOF)
         avci->draining_done = 1;

+    if (IS_EMPTY(avci->last_pkt_props)
+        && av_fifo_size(avctx->internal->pkt_props) >= sizeof(*avci->last_pkt_props))
+        av_fifo_generic_read(avctx->internal->pkt_props,
+                             avci->last_pkt_props, sizeof(*avci->last_pkt_props), NULL);
+
     if (!ret) {
         frame->best_effort_timestamp = guess_correct_pts(avctx,
                                                          frame->pts,
@@ -1490,10 +1495,6 @@ int ff_decode_frame_props(AVCodecContext *avctx, AVFrame *frame)
         { AV_PKT_DATA_S12M_TIMECODE,              AV_FRAME_DATA_S12M_TIMECODE },
     };

-    if (IS_EMPTY(pkt) && av_fifo_size(avctx->internal->pkt_props) >= sizeof(*pkt))
-        av_fifo_generic_read(avctx->internal->pkt_props,
-                             pkt, sizeof(*pkt), NULL);
-
     frame->pts = pkt->pts;
     frame->pkt_pos      = pkt->pos;
     frame->pkt_duration = pkt->duration;

in reply to:  20 comment:21 by misko, 3 years ago

Replying to James:

Can you test the following?

I applied the patch to the current master (fcb80aa289a5339353ca9b1f5b2591d0e6cc5f19). Unfortunately this doesn't fix the issue. I have 19 encoding processes with 17 of them leaking cca 30-40MiB per hour each. Two processes don't leak at all. The only difference between them is that those 2 use rtp as source. The remaining 17 use udp.

Last edited 3 years ago by misko (previous) (diff)

comment:22 by James, 3 years ago

Resolution: fixed
Status: openclosed

Should be fixed in 6b4805686c9991fbb474e9f3488b76a91bf4cd22. Will backport to 4.4 latter.
4.3 does not have 022a12b306ab2096e6ac9fc9b149828a849d65b2 so it's not affected.

in reply to:  22 comment:23 by misko, 3 years ago

Replying to James:

Should be fixed in 6b4805686c9991fbb474e9f3488b76a91bf4cd22. Will backport to 4.4 latter.
4.3 does not have 022a12b306ab2096e6ac9fc9b149828a849d65b2 so it's not affected.

Tested with ec8e95296ec069ddf29f479b62accb49ac18e8a8 which goes after 6b4805686c9991fbb474e9f3488b76a91bf4cd22. No leaks anymore.

Thanks.

Note: See TracTickets for help on using tickets.