#7687 closed defect (fixed)

h264_omx can stall due to not handling fragmented frames correctly

Reported by: 6by9 Owned by: tmm1
Priority: normal Component: avcodec
Version: git-master Keywords: h264_omx, omx
Cc: ffmpeg@tmm1.net Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no


Summary of the bug:
Encoder h264_omx has a race condition in trying to collect all the buffers when the encoded frame is split over multiple IL buffers. This condition is more prevalent at high bitrates as frames are inherently larger.

More details in https://github.com/raspberrypi/firmware/issues/1087

Whilst running and having frames fed in, omx_encode_frame output buffer handling goes through the summarised loop:

while((!*got_packet && ret == 0 && !s->got_eos) {
  buffer = get_buffer(DON'T WAIT)
  if (!buffer)
  <Pass data from IL buffer to ffmpeg buffer>
  if (buffer->nFlags & OMX_BUFFERFLAG_ENDOFFRAME) {
    *got_packet = 1;

The default on the Pi is 1 buffer of 64kB, so if the frame is over 64kB there is race from submitting the buffer with OMX_FillThisBuffer to getting back to the top and calling get_buffer. If IL hasn't filled and returned the buffer in that time, then you get the first fragment returned as the complete encoded frame.
The remaining fragment will be returned on the next call, but that's now left the next frame in the codec's output FIFO. Repeat a few times and you end up with the codec output FIFO totally full, and the codec stalls. The codec can't process an input ends up not being able to process an input buffer and therefore holds on to it, at which point FFmpeg stalls waiting for an input buffer.

On older firmware this issue didn't get triggered for reasons that aren't 100% clear, but a performance improvement made to the firmware in July 2018 has exposed this error in h264_omx

How to reproduce:

ffmpeg -c:v mpeg2_mmal -i sample.ts -c:v h264_omx -b:v 15M sample.mkv

ffmpeg git-master
built on Raspberry Pi

sample.ts is referenced in the Pi Github issue as https://www.dropbox.com/s/w7n9nn4eeeyq7qc/sample.ts?dl=1, but most 1080P clips will trigger the issue if encoded at this sort of bitrate.

The issue has also been reported when encoding a 1080P webcam through h264_enc at 2fps and 1.4Mbit/s.

Attachments (1)

0001-avcodec-omx-Fix-handling-of-fragmented-buffers.patch (2.5 KB) - added by 6by9 18 months ago.
The attached patch solves the issue by adopting waiting in get_buffer if we've had a partial frame.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 18 months ago by 6by9

Options for solving this:

  • Fix the OMX_BUFFERFLAG_ENDOFFRAME handling such that get_buffer waits if it has had one buffer through without OMX_BUFFERFLAG_ENDOFFRAME set. GStreamer has the same issue (https://gitlab.freedesktop.org/gstreamer/gst-omx/issues/4) and they have been reluctant to adopt that approach as some implementations don't handle the flag correctly. Does FFmpeg care about other IL implementations, or should this be hidden behind the existing CONFIG_OMX_RPI config flag? I do have a fairly simple patch to implement this, but may not understand all the ramifications.
  • Ask for more buffers from IL. The client is entitled to set nBufferCount to a number greater than nBufferCountMin, therefore ask for more buffers and there is a reasonable expectation that the codec will have filled them in time. Again, does this need to be conditional on being on a Pi (the bug in theory will affect all IL implementations).

Any views? I see that h264_omx hasn't had any significant development for a good while, so does anyone still act as maintainer?

comment:2 Changed 18 months ago by cehoyos

Isn’t this a duplicate of ticket #6586?

comment:3 Changed 18 months ago by 6by9

Isn’t this a duplicate of ticket #6586?

No. #6586 is the zerocopy path of h264_omx passing a pointer into IL that has issues in VCHIQ
The buffer is allocated by V4L2, mmaped into userspace results with flags VM_IO and/or VM_PFNMAP. VCHIQ then tries to pass that to the GPU on behalf of IL and get_user_pages fails. https://github.com/raspberrypi/firmware/issues/851
I'm 99.9% certain that is a VCHIQ kernel driver issue - FFmpeg is in the clear.

This issue is FFmpeg mishandling fragmented buffers on the output side.

Changed 18 months ago by 6by9

The attached patch solves the issue by adopting waiting in get_buffer if we've had a partial frame.

comment:4 Changed 18 months ago by 6by9

Increasing the number of buffers is even simpler with something like

diff --git a/libavcodec/omx.c b/libavcodec/omx.c
index e813362..1f99917 100644
--- a/libavcodec/omx.c
+++ b/libavcodec/omx.c
@@ -501,7 +501,9 @@ static av_cold int omx_component_init(AVCodecContext *avctx, const char *role)
         out_port_params.format.video.eCompressionFormat = OMX_VIDEO_CodingMPEG4;
     else if (avctx->codec->id == AV_CODEC_ID_H264)
         out_port_params.format.video.eCompressionFormat = OMX_VIDEO_CodingAVC;
+    out_port_params.nBufferCountActual = 3;

     err = OMX_SetParameter(s->handle, OMX_IndexParamPortDefinition, &out_port_params);
     err = OMX_GetParameter(s->handle, OMX_IndexParamPortDefinition, &out_port_params);

but that only reduces the chances of hitting the issue (admittedly to very very slim) rather than actually solving it.

comment:5 Changed 11 months ago by tmm1

  • Cc ffmpeg@tmm1.net added
  • Owner set to tmm1
  • Status changed from new to open

Both patches look reasonable. I will add them to my local patchset and send to ffmpeg-devel for review.

comment:6 Changed 11 months ago by cehoyos

  • Resolution set to fixed
  • Status changed from open to closed
Note: See TracTickets for help on using tickets.