Opened 5 years ago

Closed 4 years ago

#3842 closed enhancement (fixed)

AAC (mp4a) DASH takes abnormally long to detect (~4 - 6s)

Reported by: viperfx Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: mov
Cc: nfxjfg@googlemail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
It takes roughly ~4s for FFmpeg to detect the codec of a mp4a audio stream in the DASH container for an audio of length ~3m. The audio stream is from youtube. I have checked the location of the moov atom and it appears to be at the front of the file. I have spoken with devs on #ffmpeg-devel and they have confirmed the same issue and suggest that it should not take this long.

Here is the output using boxdumper: http://tny.cz/571eb4af

How to reproduce:

Obtain the youtube audio stream link using youtube-dl (https://github.com/rg3/youtube-dl)
% youtube-dl -f 140 -g YOUTUBE_URL
where YOUTUBE_URL is the url of any youtube video. Pick a music video of length 3-4mins so you can notice the ~4s delay.
The youtube-dl command will print out a long URL string that you can input to FFmpeg or ffplay to notice the issue
% ffmpeg -i "STREAM_URL"
or 
% ffplay "STREAM_URL"

My FFmpeg info:
ffmpeg version 2.2.1 Copyright (c) 2000-2014 the FFmpeg developers
  built on Aug  9 2014 10:03:55 with Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/2.2.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --enable-vda --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-ffplay --enable-libfdk-aac --enable-openssl
  libavutil      52. 66.100 / 52. 66.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  2.100 /  4.  2.100
  libavresample   1.  2.  0 /  1.  2.  0
  libswscale      2.  5.102 /  2.  5.102
  libswresample   0. 18.100 /  0. 18.100
  libpostproc    52.  3.100 / 52.  3.100

Output of a typical audio stream:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'STREAM_URL':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2014-03-08 01:30:29
  Duration: 00:04:00.70, start: 0.000000, bitrate: 127 kb/s
    Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2014-03-08 01:30:29
      handler_name    : SoundHandler

Change History (16)

comment:1 Changed 5 years ago by cehoyos

  • Keywords mov added; DASH mp4a aac removed

Did you upload a sample?
Is the problem reproducible with current FFmpeg git head?

Please do not use external resources (except for large sample files), always post all necessary information here in the ticket.

comment:2 Changed 5 years ago by viperfx

Here is a sample audio file (it was too large for the attachment): https://www.sendspace.com/file/nmzcbi
Yes, problem is reproducible with the current FFmpeg master.

comment:3 Changed 5 years ago by cehoyos

I tested the following:

$ time ffmpeg -i videoplayback.mp4
ffmpeg version N-65479-g7117547 Copyright (c) 2000-2014 the FFmpeg developers
  built on Aug 10 2014 13:36:46 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      54.  0.100 / 54.  0.100
  libavcodec     56.  0.100 / 56.  0.100
  libavformat    56.  0.100 / 56.  0.100
  libavdevice    56.  0.100 / 56.  0.100
  libavfilter     5.  0.100 /  5.  0.100
  libswscale      3.  0.100 /  3.  0.100
  libswresample   1.  0.100 /  1.  0.100
  libpostproc    53.  0.100 / 53.  0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2014-04-25 14:50:56
  Duration: 00:03:35.03, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2014-04-25 14:50:56
      handler_name    : SoundHandler
At least one output file must be specified

real    0m0.011s
user    0m0.007s
sys     0m0.003s

The time spent on analyzing the mov file seems reasonable to me.
Playback with ffplay starts immediately afaict (see how the process time matches the duration):

$ time ffplay -i videoplayback.mp4 -autoexit
ffplay version N-65479-g7117547 Copyright (c) 2003-2014 the FFmpeg developers
  built on Aug 10 2014 13:36:46 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      54.  0.100 / 54.  0.100
  libavcodec     56.  0.100 / 56.  0.100
  libavformat    56.  0.100 / 56.  0.100
  libavdevice    56.  0.100 / 56.  0.100
  libavfilter     5.  0.100 /  5.  0.100
  libswscale      3.  0.100 /  3.  0.100
  libswresample   1.  0.100 /  1.  0.100
  libpostproc    53.  0.100 / 53.  0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':f=0/0
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2014-04-25 14:50:56
  Duration: 00:03:35.03, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2014-04-25 14:50:56
      handler_name    : SoundHandler
 215.02 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B f=0/0

real    3m35.104s
user    0m4.030s
sys     0m2.981s

How can I reproduce the issue?

comment:4 Changed 5 years ago by gjdfgh

Check how much data it reads when opening the file.

The issue is that it seems to read the whole file, which ruins network performance.

comment:5 in reply to: ↑ description Changed 5 years ago by cehoyos

Replying to viperfx:

  configuration: --prefix=/usr/local/Cellar/ffmpeg/2.2.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --enable-vda --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-ffplay --enable-libfdk-aac --enable-openssl

Unrelated: --enable-vda and --enable-ffplay do not work the way you seem to believe these options work, please remove them (configure will silently ignore those options if SDL or vda are not available, if they are available, ffplay and vda will be enabled automatically). I also believe it is better to remove --host-cflags= --host-ldflags= since they might lead to confusion.
Is --cc=clang necessary? It looks redundant to me.

Most users prefer libfdk over libfaac, is there a particular reason why you enable libfaac?

I am curious: Did you test --enable-hardcoded-tables? (I never did.) Does it have any measurable effect?

comment:6 Changed 5 years ago by cehoyos

FFmpeg claims it did not read the whole file (less than one quarter), how can I reproduce that this is not correct?

$ ffmpeg -loglevel debug -i videoplayback.mp4
ffmpeg version N-65479-g7117547 Copyright (c) 2000-2014 the FFmpeg developers
  built on Aug 10 2014 13:36:46 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      54.  0.100 / 54.  0.100
  libavcodec     56.  0.100 / 56.  0.100
  libavformat    56.  0.100 / 56.  0.100
  libavdevice    56.  0.100 / 56.  0.100
  libavfilter     5.  0.100 /  5.  0.100
  libswscale      3.  0.100 /  3.  0.100
  libswresample   1.  0.100 /  1.  0.100
  libpostproc    53.  0.100 / 53.  0.100
Splitting the commandline.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-i' ... matched as input file with argument 'videoplayback.mp4'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input file videoplayback.mp4.
Successfully parsed a group of options.
Opening an input file: videoplayback.mp4.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x33f4b20] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x33f4b20] ISO: File Type Major Brand: dash
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x33f4b20] Before avformat_find_stream_info() pos: 3374213 bytes read:720896 seeks:21
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x33f4b20] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x33f4b20] After avformat_find_stream_info() pos: 4796 bytes read:753664 seeks:22 frames:1
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2014-04-25 14:50:56
  Duration: 00:03:35.03, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0(und), 1, 1/44100: Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2014-04-25 14:50:56
      handler_name    : SoundHandler
Successfully opened the file.
At least one output file must be specified
[AVIOContext @ 0x33f3be0] Statistics: 753664 bytes read, 22 seeks

comment:7 Changed 5 years ago by gjdfgh

Note that loading gets fast with fflags=+ignidx.

According to Paranoialmaniac on IRC, we need to support the "sidx" atom, which apparently contains an index.

comment:8 Changed 5 years ago by cehoyos

Or is the issue that FFmpeg seeks to the end when opening the file?
(Sorry but imo this isn't obvious at all from the original report.)

comment:9 Changed 5 years ago by gjdfgh

22 seeks

There's your answer. These translate to 23 http connects.

comment:10 follow-up: Changed 5 years ago by viperfx

I should note that I am compiling FFmpeg on iOS so those flags I posted a slightly different to what I have configured on the iOS app.

Snippet of the configure flags.

        ./configure --disable-programs --disable-shared --enable-static --enable-pic --enable-small --enable-openssl ${DEBUG_CONFIG_ARGS} \
        --disable-decoders --enable-decoder=aac --enable-decoder=h264 --enable-decoder=vorbis \
        --disable-encoders --enable-encoder=aac \
        --disable-demuxers --enable-demuxer=aac --enable-demuxer=mov --enable-demuxer=matroska --enable-demuxer=h264 \
        --disable-muxers --enable-muxer=mov --enable-muxer=mp4 --enable-muxer=hls --enable-muxer=h264 \
        --disable-filters --disable-doc

As gjdfgh said, I recieved a suggestion of disabling indexing to make the seeking faster. It does indeed make the file open faster, almost instantly through HTTP infact. However, the issue is that it thinks that the duration of the file is 10s. It does not seem to detect the full duration. Therefore, for a UI perspective I cannot seek the audio track since I do not know the full duration of the file. The good thing is that the audio plays till the end of the file.

cehoyos: The issue is that it does seek to the end of the file when opening. This is evident when using ffmpeg -i to test with an HTTP URL rather than a local file. Using avformat_open_input call, takes >~4s.

comment:11 in reply to: ↑ 10 ; follow-ups: Changed 5 years ago by cehoyos

Replying to viperfx:

./configure --disable-programs --disable-shared --enable-static --enable-pic 

Unrelated: Do you know if --enable-pic has any advantages or disadvantages on iOS?
(We have discussed this internally and no clear consensus was reached iirc.)

--disable-shared --enable-static is the default, you may remove it to get a shorter configure line.

--disable-decoders --enable-decoder=aac --enable-decoder=h264 --enable-decoder=vorbis

You can use --enable-decoder=aac,h264,vorbis to make your configure line more readable (same for encoders and demuxers and muxers)

comment:12 in reply to: ↑ 11 Changed 5 years ago by viperfx

Replying to cehoyos:

Replying to viperfx:

./configure --disable-programs --disable-shared --enable-static --enable-pic 

Unrelated: Do you know if --enable-pic has any advantages or disadvantages on iOS?
(We have discussed this internally and no clear consensus was reached iirc.)

--disable-shared --enable-static is the default, you may remove it to get a shorter configure line.

--disable-decoders --enable-decoder=aac --enable-decoder=h264 --enable-decoder=vorbis

You can use --enable-decoder=aac,h264,vorbis to make your configure line more readable (same for encoders and demuxers and muxers)

Ah okay, thanks. No, I am afraid I am fairly new to FFmpeg and static compiling libs in general for iOS. I am the one who should be asking you for tips it seems :)

comment:13 in reply to: ↑ 11 Changed 5 years ago by cehoyos

Replying to cehoyos:

Unrelated: Do you know if --enable-pic has any advantages or disadvantages on iOS?
(We have discussed this internally and no clear consensus was reached iirc.)

It is not completely impossible that --enable-pic has a huge performance impact without having any advantage at all. But I don't know if anybody tested this.

comment:14 Changed 5 years ago by cehoyos

  • Priority changed from normal to wish
  • Status changed from new to open
  • Type changed from defect to enhancement
  • Version changed from 2.2.4 to git-master

Afaiu, this is an enhancement request to support reading the sidx box as defined in 8.16.3 of ISO 14496-12.

comment:15 Changed 5 years ago by gjdfgh

  • Cc nfxjfg@googlemail.com added

Making this bug report a feature request is pushing it a bit...

comment:16 Changed 4 years ago by cehoyos

  • Resolution set to fixed
  • Status changed from open to closed

Implemented by Roger Combs in 4ab56667594842283dc5ae07f0daba2a2cb4d3af

Note: See TracTickets for help on using tickets.