Opened 3 years ago

Last modified 2 years ago

#5224 open defect

Excessive memory use in H.264 decoder with threading enabled

Reported by: jkqxz Owned by:
Priority: normal Component: avcodec
Version: unspecified Keywords: h264
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Given a stream with gaps in frame_num, the threaded decoder may allocate many more frames than it should. (Up to thread count * num_ref_frames whole frame buffers.)

See attached stream.

This has parameters:

  • Baseline profile
  • num_ref_frames = 16
  • log2_max_frame_num_minus4 = 12
  • gaps_in_frame_num_value_allowed_flag = 1

The stream is then a single IDR frame of black, followed by all-skip P frames with frame_num decreasing by one each time (65535, 65534, ...).

Decode this stream with:

% ffmpeg -v 55 -vsync 0 -threads 8 -thread_type frame+slice -i large_frame_num_gaps.264 -f null -

Virtual memory use is much higher than expected, though this is rather hard to see. (Since the frames are never actually touched, the real memory use is not excessive.)

To see the problem more effectively, apply the following patch to instrument malloc/free:

diff --git a/libavutil/mem.c b/libavutil/mem.c
index 8dfaad8..bddb0d1 100644
--- a/libavutil/mem.c
+++ b/libavutil/mem.c
@@ -69,6 +69,7 @@ void  free(void *ptr);
  * Note that this will cost performance. */
 
 static size_t max_alloc_size= INT_MAX;
+static void *big_mem_list[100];
 
 void av_max_alloc(size_t max){
     max_alloc_size = max;
@@ -139,6 +140,18 @@ void *av_malloc(size_t size)
     if (ptr)
         memset(ptr, FF_MEMORY_POISON, size);
 #endif
+
+#if 1
+    if(size > 1000000) {
+        int i;
+        av_log(0, AV_LOG_DEBUG, "malloc(%zu) = %p\n", size, ptr);
+        for(i = 0; i < FF_ARRAY_ELEMS(big_mem_list) && big_mem_list[i]; i++);
+        if(i >= FF_ARRAY_ELEMS(big_mem_list))
+            av_assert0(0 && "Too many big allocations.");
+        big_mem_list[i] = ptr;
+    }
+#endif
+
     return ptr;
 }
 
@@ -227,6 +240,19 @@ int av_reallocp_array(void *ptr, size_t nmemb, size_t size)
 
 void av_free(void *ptr)
 {
+#if 1
+    if(ptr) {
+        int i;
+        for(i = 0; i < FF_ARRAY_ELEMS(big_mem_list); i++) {
+            if(big_mem_list[i] == ptr) {
+                av_log(0, AV_LOG_DEBUG, "free(%p)\n", ptr);
+                big_mem_list[i] = 0;
+                break;
+            }
+        }
+    }
+#endif
+
 #if CONFIG_MEMALIGN_HACK
     if (ptr) {
         int v= ((char *)ptr)[-1];

Now the first command will abort with threads = 8 (and not with threads = 1), because it tries to allocate more than 100 frame buffers.

Attachments (4)

large_frame_num_gaps.264 (6.4 KB) - added by jkqxz 3 years ago.
Raw H.264 stream to use as input
vanilla_output (19.6 KB) - added by jkqxz 3 years ago.
Output with vanilla ffmpeg
modified_output (16.5 KB) - added by jkqxz 3 years ago.
Output when modified with given patch
modified_onethread (21.0 KB) - added by jkqxz 3 years ago.
Output when modified and with only one thread

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by jkqxz

Raw H.264 stream to use as input

Changed 3 years ago by jkqxz

Output with vanilla ffmpeg

Changed 3 years ago by jkqxz

Output when modified with given patch

Changed 3 years ago by jkqxz

Output when modified and with only one thread

comment:1 in reply to: ↑ description ; follow-up: Changed 3 years ago by cehoyos

  • Keywords h264 added

Replying to jkqxz:

Given a stream with gaps in frame_num, the threaded decoder may allocate many more frames than it should. (Up to thread count * num_ref_frames whole frame buffers.)

I believe you are simply describing how multi-threaded decoding works or what do I miss?

comment:2 in reply to: ↑ 1 ; follow-up: Changed 3 years ago by kierank

Replying to cehoyos:

Replying to jkqxz:

Given a stream with gaps in frame_num, the threaded decoder may allocate many more frames than it should. (Up to thread count * num_ref_frames whole frame buffers.)

I believe you are simply describing how multi-threaded decoding works or what do I miss?

He/she has exactly explained the problem, you clearly are missing something.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by cehoyos

Replying to kierank:

Replying to cehoyos:

Replying to jkqxz:

Given a stream with gaps in frame_num, the threaded decoder may allocate many more frames than it should. (Up to thread count * num_ref_frames whole frame buffers.)

I believe you are simply describing how multi-threaded decoding works or what do I miss?

He/she has exactly explained the problem, you clearly are missing something.

The OP is unhappy that FFmpeg is allocating 8*16 frames when decoding a H.264 video with 16 reference frames using eight threads. I suspect that this is how multithreaded H.264 decoding works: What do I miss?

comment:4 in reply to: ↑ 3 Changed 3 years ago by heleppkes

Replying to cehoyos:

Replying to kierank:

Replying to cehoyos:

Replying to jkqxz:

Given a stream with gaps in frame_num, the threaded decoder may allocate many more frames than it should. (Up to thread count * num_ref_frames whole frame buffers.)

I believe you are simply describing how multi-threaded decoding works or what do I miss?

He/she has exactly explained the problem, you clearly are missing something.

The OP is unhappy that FFmpeg is allocating 8*16 frames when decoding a H.264 video with 16 reference frames using eight threads. I suspect that this is how multithreaded H.264 decoding works: What do I miss?

You miss that a "normal" file would not do this since the references are shared, and these 16 "dummy" frames per thread are never accessed and discarded again shortly after (although only after all threads already allocated them).

comment:5 Changed 2 years ago by cehoyos

I still don't understand this ticket: If it describes an issue that can be fixed in FFmpeg, please set to Open / Reproduced.

comment:6 Changed 2 years ago by heleppkes

  • Reproduced by developer set
  • Status changed from new to open

If it can realistically be fixed or any potential fix would be extremely complex is up in the air, but considering it has actually been reported by a developer and can easily be reproduced, keeping it for further analysis is certainly the right thing to do.

Note: See TracTickets for help on using tickets.