Opened 4 years ago

Last modified 4 years ago

#3354 new enhancement

enhancement: Zero latency av_read_frame()

Reported by: pjw Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: mpegts, latency
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no



I'm using ffmpeg to setup a (very) low latency video streaming application (in C++). Ffmpeg is used on both the server and client side. The video is encoded as h264 (i.e. "libx264") and transported in a transport stream (mpegts) over UDP.

Up to now I've been able to reduce the latency to: encoding + transport + decoding + one-frame-time. I'd like it to be just encoding + transport + decoding.

The problem seems to be that av_read_frame() always holds back one video frame, i.e. frame n is returned only when av_read_frame() for n+1 is called. I'd like av_read_frame() to return a frame as soon as possible, without any delay.

How to test

Besides the video stream, I've added an extra data stream. It is used to transport a timestamp along-side each video frame. The timestamp represents the point at which the data is sent. A data packet and a video packet have the same PTS. In the client code, i can synchronize the data stream packets with the video stream packets using PTS. Now i know when the data pkt and video frame were sent and I also know when they arrived. This allows me to calculate the transport-delay. This is of course only true when the server and client use the same clock. In my tests I executed the server and client code on the same host.

The transport delay of the data packet is sub-milliseconds, as expected. However, the delay of the video frame is ~25ms (presumably 20ms = one frame time at 50 Hz + 5ms for encoding/decoding). I expected it to be just ~5ms, i.e. ecoding/decoding time.

Wireshark shows that both the video data packets and private data packet are sent at the same time. So the delay of one frame (20ms) is introduced by the client code. I think the it is caused by (a combination of) mpegts and h264_parser.

Server code

This is what i did on the server side. Note that this is not a working example. It has been stripped to keep it short(-ish):

avformat_alloc_output_context2(&mFormatContext, nullptr,
   "mpegts", "udp://");

// These flags don't seem to help.
// mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;
// mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;

// Add video stream:
mCodec = avcodec_find_encoder_by_name("libx264");
mVidStream = avformat_new_stream(mFormatContext, mCodec);
mVidStream->id = mFormatContext->nb_streams - 1;
mCodecContext = mVidStream->codec;
mCodecContext->codec_id = mCodec->id;
mCodecContext->bit_rate = mBitrate;
mCodecContext->width    = 1280;
mCodecContext->height   = 720;
mCodecContext->gop_size = 1;
mCodecContext->pix_fmt  = AV_PIX_FMT_YUV420P;
mCodecContext->time_base.den = 50; // 50 Hz
mCodecContext->time_base.num = 1;
mCodecContext->max_b_frames  = 0;
mCodecContext->thread_count  = mThreads; // tried 1 - 4. All have same effect.
mCodecContext->thread_type   = FF_THREAD_SLICE;

// These options also don't have the desired effect.
// mCodecContext->flags |= CODEC_FLAG_LOW_DELAY;
// mCodecContext->flags2 |= CODEC_FLAG2_FAST;

// Add (private) data stream. Will be used to send a packet containing a timestamp
// along-side each video frame.
mDataStream = avformat_new_stream(mFormatContext, nullptr);
mDataStream->id                = mFormatContext->nb_streams - 1;
mDataStream->codec             = avcodec_alloc_context3(nullptr);
mDataStream->codec->codec_type = AVMEDIA_TYPE_DATA;
mDataStream->codec->codec_id   = AV_CODEC_ID_SMPTE_KLV;
mDataCodecContext              = mDataStream->codec;

if (mFormatContext->oformat->flags & AVFMT_GLOBALHEADER)
  mCodecContext->flags     |= CODEC_FLAG_GLOBAL_HEADER;
  mDataCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;

// No delay please! :-)
av_opt_set(mCodecContext->priv_data, "preset", "ultrafast", 0);
av_opt_set(mCodecContext->priv_data, "tune", "zerolatency", 0);

avcodec_open2(mCodecContext, mCodec, nullptr);
avformat_write_header(mFormatContext, nullptr);

// encode loop:

fill_yuv_image(&mDstPicture, frame_count++,
   mCodecContext->width, mCodecContext->height);

AVPacket pkt;

int got_packet;
int err = avcodec_encode_video2(mCodecContext, &pkt, mFrame, &got_packet);
if (err < 0) return;
if (!err && got_packet && pkt.size)
  pkt.stream_index = mVidStream->index;
  pkt.duration     = 0;

  // Send a data packet alongside each video frame.
  AVPacket dataPkt;
  dataPkt.stream_index = mDataStream->index;
  dataPkt.pts          = pkt.pts; // Same as video
  dataPkt.dts          = pkt.dts;
  double tNow = get_system_time_since_epoch_in_seconds(); = (unsigned char*) &tNow;
  dataPkt.size = sizeof(double);

  // Write side data.
  av_write_frame(mFormatContext, &dataPkt);
  // Flush; probably not necessary but should not hurt.
  av_write_frame(mFormatContext, nullptr);

  // Write video frame.
  err = av_write_frame(mFormatContext, &pkt);
  // Flush; again.. probably not necessary but should not hurt.
  err = av_write_frame(mFormatContext, nullptr);

// PTS for next frame
mFrame->pts += av_rescale_q(1, mVidStream->codec->time_base, mVidStream->time_base);

Client code

This is the client code. Also stripped in an attempt to keep it short:

avformat_open_input(&mFormatContext, "udp://"", nullptr, nullptr);

// These flags work! But this negates the use of av_read_frame() as it now
// does not guarantee to return one frame.

// These flags seem to have no effect.
// mFormatContext->flags |= AVFMT_FLAG_NOBUFFER;
// mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;
// mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;

avformat_find_stream_info(mFormatContext, nullptr);

// Video stream.
mVideoStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_VIDEO, -1, -1, nullptr, 0);

AVStream* st = mFormatContext->streams[mVideoStreamIdx];
// This flag works! But this negates the use of av_read_frame() as it now
// does not guarantee to return one frame.
// st->need_parsing = AVSTREAM_PARSE_NONE;

// find decoder for the stream
AVCodecContext* dec_ctx = st->codec;
AVCodec* dec = avcodec_find_decoder(dec_ctx->codec_id);
if (dec->capabilities & CODEC_CAP_TRUNCATED)
  dec_ctx->flags |= CODEC_FLAG_TRUNCATED;
dec_ctx->thread_type  = FF_THREAD_SLICE;
dec_ctx->thread_count = mThreads;

// These don't have the desired effect:
//    dec_ctx->flags |= CODEC_FLAG_LOW_DELAY;
//    dec_ctx->flags2 |= CODEC_FLAG2_FAST;
//    dec_ctx->flags2 |= CODEC_FLAG2_CHUNKS;
//    dec_ctx->refcounted_frames = 1;

avcodec_open2(dec_ctx, dec, nullptr);
mVideoStream        = mFormatContext->streams[mVideoStreamIdx];
mVideoDecodeContext = mVideoStream->codec;

mDataStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_DATA, -1, -1, nullptr, 0);

// Decoding loop:

static std::queue<std::pair<int64_t, double> > ptsDb;

AVPacket pkt;
av_init_packet(&pkt); = nullptr;
pkt.size = 0;

// wait for data.
if (av_read_frame(mFormatContext, &pkt) < 0)

double tRecv = get_system_time_since_epoch_in_seconds();
if (pkt.stream_index == mDataStreamIdx)
  double tData = *(double*) (;
  printf("DAT PTS %li\trecv'd @ %.2lf [ms], trans delay %.4lf [ms]\n", pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3);

  ptsDb.emplace(std::make_pair(pkt.pts, tRecv));
else if (pkt.stream_index == mVideoStreamIdx)
  // Quick hack to sync data stream packets with video packets.
  std::pair<int64_t, double> elem {0, 0};
  while (!ptsDb.empty())
    elem = ptsDb.front();
    if (elem.first < pkt.pts)
    if (elem.first == pkt.pts)
    if (elem.first > pkt.pts)
      elem.second = 0;

  double tData = elem.second;
  printf("VID PTS %li\trecv'd @ %.2lf [ms], delta with data %.2lf [ms] (%i)\n", pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3, ptsDb.size());

  // decode video frame
  int got_frame = 0;
  mFrame = avcodec_alloc_frame();
  double t1  = get_system_time_since_epoch_in_seconds();
  avcodec_decode_video2(mVideoDecodeContext, mFrame, &got_frame, &pkt);
  double t2  = get_system_time_since_epoch_in_seconds();
  if (got_frame)
    printf("[DEBUG] Got frame VID PTS %lli\tdecoding time %.2lf [ms]\n", mFrame->pkt_pts, (t2 - t1) * 1e3);


As noted in the code above, the flags AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN
and/or AVSTREAM_PARSE_NONE seem to (almost) do what i want. My understanding of these flags is that
they essentially disable the functionality which ensures one frame is available. So once
these flags are used there is no way of knowing when a frame is ready to be decoded, in which case they are not usable.

Change History (1)

comment:1 Changed 4 years ago by cehoyos

  • Priority changed from normal to wish
Note: See TracTickets for help on using tickets.