Opened 4 years ago

Last modified 15 months ago

#2325 new defect

MP4 AAC Audio is delayed by 2ms when converted to PCM

Reported by: brchapman Owned by:
Priority: important Component: undetermined
Version: git-master Keywords: aac mov regression
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

When using ffmpeg to convert an aac audio stream from an mp4 to pcm, the result is out of sync by about 2ms. By adding -ss 00:00:00.02 after the input, then output is correctly aligned.

How to reproduce:

% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav
ffmpeg version 1.1.git
git revision: faa0068
built on Mar  4 2013 11:40:27

Attachments (2)

test100.mp4 (433.1 KB) - added by brchapman 4 years ago.
Screen Shot 2013-03-04 at 8.05.40 PM.png (44.1 KB) - added by brchapman 4 years ago.
Sample Screenshot of waveform

Download all attachments as: .zip

Change History (18)

Changed 4 years ago by brchapman

Changed 4 years ago by brchapman

Sample Screenshot of waveform

comment:1 Changed 4 years ago by brchapman

  • Version changed from unspecified to git-master

comment:2 Changed 4 years ago by cehoyos

  • Keywords mov added; mp4 pcm removed

Please provide your failing command line together with complete, uncut console output to make this a valid ticket.

comment:3 Changed 4 years ago by brchapman

The command doesn't fail, but here's the output

% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav                          ffmpeg version 1.1.git Copyright (c) 2000-2013 the FFmpeg developers  built on Mar  4 2013 11:40:27 with Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --cc=cc --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-libfreetype --enable-ffplay
  libavutil      52. 17.103 / 52. 17.103
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.102 / 54. 63.102
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 41.100 /  3. 41.100
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    creation_time   : 2013-03-04 21:40:01
  Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Video Media Handler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
File 'test100_audio.wav' already exists. Overwrite ? [y/N] y
Output #0, wav, to 'test100_audio.wav':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    ISFT            : Lavf54.63.102
    Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
Stream mapping:
  Stream #0:1 -> #0:0 (aac -> pcm_s16le)
Press [q] to stop, [?] for help
size=    2344kB time=00:00:12.50 bitrate=1536.1kbits/s    
video:0kB audio:2344kB subtitle:0 global headers:0kB muxing overhead 0.003333%

comment:4 follow-up: Changed 4 years ago by cehoyos

  • Keywords regression added
  • Priority changed from normal to important

If there is an issue, it is a regression since 1edea05

Could you explain how you know that the delay is a bug? What is the other application you are testing?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac? I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?

comment:5 Changed 4 years ago by cehoyos

The nero binary decoder shows the same delay as FFmpeg.

comment:6 in reply to: ↑ 4 ; follow-up: Changed 4 years ago by brchapman

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

What is the other application you are testing?

Adobe After Effects

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?
After Effects decodes it without the delay.
However, I just noticed that when exporting a wave file from the mp4 using Quicktime 7 Pro, it's audio delay was reversed. The Quicktime's audio is 2ms ahead of the source, whereas ffmpeg's audio is 2ms behind the source.

If there is an alternate solution using command line flags, I'd be up for that as well. Just trying to figure out how to make sure the source and output audio lines up exactly.

Version 0, edited 4 years ago by brchapman (next)

comment:7 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by cehoyos

Replying to brchapman:

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Since it is a regression since that revision, you will have to test an earlier version;-)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by brchapman

Replying to cehoyos:

Replying to brchapman:

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Since it is a regression since that revision, you will have to test an earlier version;-)

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay, and the other ticket about a duplicate first frame I posted #2324 is also fixed!

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 4 years ago by cehoyos

Replying to brchapman:

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,

and the other ticket about a duplicate first frame I posted #2324 is also fixed!

No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime? are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects?, you see the same delay as when transcoding the original mp4 file, or am I wrong?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 4 years ago by brchapman

Replying to cehoyos:

Replying to brchapman:

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,

and the other ticket about a duplicate first frame I posted #2324 is also fixed!

No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime? are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects?, you see the same delay as when transcoding the original mp4 file, or am I wrong?

yes, if i first transcode the orignal

% ffmpeg -i test100.mp4 -c:a copy test100.aac

then:

% ffmpeg -i test100.aac -c:a pcm_s16le test100_audio.wav

test100_audio.wav is delayed.

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324
Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

comment:11 in reply to: ↑ 10 Changed 4 years ago by cehoyos

Replying to brchapman:

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324

You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).

Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.

comment:12 follow-up: Changed 4 years ago by cehoyos

I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by brchapman

Replying to cehoyos:

Replying to brchapman:

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324

You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).

Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.

So when I use -vsync 0, the first frame isn't duplicated, but rather the first frame is now completely black. Any other flags I can use to get rid of this?

I'd use -ss to skip past the first frame, which works on it's own. However if I try to use it with -filter_complex overlay and an image sequence that's overlaid on top of the source video, the sequence doesn't end up starting until frame 2 (frame 1 on screen). Here's that command:

% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 -start_number 1 -i test100_hu

ffmpeg version N-37747-g058e1f8 Copyright (c) 2000-2013 the FFmpeg developers
  built on Mar  5 2013 19:38:09 with llvm-gcc 4.2.1 (LLVM build 2336.11.00)
  configuration: --prefix=/usr/local/ --enable-shared --enable-pthreads --enable-gpl
  libavutil      52. 17.103 / 52. 17.103
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.103 / 54. 63.103
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 42.103 /  3. 42.103
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    creation_time   : 2013-03-04 21:40:01
  Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Video Media Handler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
[image2 @ 0x7fe4d9033c00] max_analyze_duration 5000000 reached at 5000000 microseconds
Input #1, image2, from 'test100_hud/test100_transcoder%05d.png':
  Duration: 00:00:12.50, start: 0.000000, bitrate: N/A
    Stream #1:0: Video: png, rgba, 1280x720, 24 fps, 24 tbr, 24 tbn, 24 tbc
[prores @ 0x7fe4d9466400] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946a800] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946d000] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946f800] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d9038000] encoding with ProRes standard (apcn) profile
Output #0, mov, to 'test100_ffmpeg.mov':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    encoder         : Lavf54.63.103
    Stream #0:0: Video: prores (apcn) (apcn / 0x6E637061), yuv422p10le, 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 12288 tbn, 24 tbc
Stream mapping:
  Stream #0:0 (h264) -> overlay:main
  Stream #1:0 (png) -> overlay:overlay
  overlay -> Stream #0:0 (prores)
Press [q] to stop, [?] for help
frame=  300 fps= 34 q=0.0 Lsize=    8944kB time=00:00:12.50 bitrate=5861.8kbits/s    
video:8942kB audio:0kB subtitle:0 global headers:0kB muxing overhead 0.021776%
ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24  15.18s user 0.24s system 174% cpu 8.838 total

Looking at mediainfo for test100.mp4, I can see that the video track is cfr, where as the audio is variable. Which I'm guessing causes the "Overall bit rate mode" to become variable. Is this what your talking about?

% mediainfo test100.mp4                                                                                 
General
Complete name                            : test100.mp4
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42
File size                                : 433 KiB
Duration                                 : 12s 500ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 284 Kbps
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:41:11
?TIM                                     : 00:00:00:00
?TSC                                     : 24
?TSZ                                     : 1

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L5.1
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 3 frames
Format settings, GOP                     : M=4, N=33
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 12s 500ms
Bit rate                                 : 85.5 Kbps
Width                                    : 1 280 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 24.000 fps
Standard                                 : NTSC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.004
Stream size                              : 130 KiB (30%)
Language                                 : English
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:40:01

Audio
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 12s 500ms
Source duration                          : 12s 501ms
Bit rate mode                            : Variable
Bit rate                                 : 192 Kbps
Maximum bit rate                         : 329 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 289 KiB (67%)
Source stream size                       : 289 KiB (67%)
Language                                 : English
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:40:01

Replying to cehoyos:

I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

Gongs start on 0, 41, 84, 116, 167, 207, 246, 285

comment:14 in reply to: ↑ 13 Changed 4 years ago by cehoyos

Replying to brchapman:

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

Gongs start on 0, 41, 84, 116, 167, 207, 246, 285

This is exactly what I see here with different players, both with the original and a sample re-encoded by FFmpeg, if FFmpeg would cut the first 0.02 seconds of audio, this would get wrong or do I miss something?

comment:15 Changed 4 years ago by Cigaes

0.02 seconds of audio is about half a video frame.

And 0.002 seconds, as stated in the title of this ticket, is the audio delay you get by lounging on the couch instead of sitting straight (70 cm more for the sound to travel).

comment:16 Changed 15 months ago by rmk

I have run into this as well and the way I see it, ffmpeg reads but does not write the custom mov metadata (udta/meta) "iTunSMPB". Check around line 3121 in mov.c where priming is set based on the values from that metadata entry. It the mov muxer wrote that metadata, ffmpeg would at least be compatible with itself as far as priming is concerned (i.e. not decode an offset) and with Apple tools. That would be an improvement. I don't know if trailing samples are treated in any way in ffmpeg but AFAIC the priming problem is the more serious one as it leads to a/v desync when the delay is long enough (2112 samples is what Apple uses by default and that's a bit more than a 25 FPS video frame and that's noticable/significant for some people).

For further explanations see also http://ffmpeg.org/pipermail/ffmpeg-devel/2012-July/127834.html

Note: See TracTickets for help on using tickets.