Opened 8 years ago
Last modified 5 years ago
#2325 new defect
MP4 AAC Audio is delayed by 2ms when converted to PCM
Reported by: | brchapman | Owned by: | |
---|---|---|---|
Priority: | important | Component: | undetermined |
Version: | git-master | Keywords: | aac mov regression |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
When using ffmpeg to convert an aac audio stream from an mp4 to pcm, the result is out of sync by about 2ms. By adding -ss 00:00:00.02 after the input, then output is correctly aligned.
How to reproduce:
% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav ffmpeg version 1.1.git git revision: faa0068 built on Mar 4 2013 11:40:27
Attachments (2)
Change History (18)
Changed 8 years ago by brchapman
Changed 8 years ago by brchapman
comment:1 Changed 8 years ago by brchapman
- Version changed from unspecified to git-master
comment:2 Changed 8 years ago by cehoyos
- Keywords mov added; mp4 pcm removed
Please provide your failing command line together with complete, uncut console output to make this a valid ticket.
comment:3 Changed 8 years ago by brchapman
The command doesn't fail, but here's the output
% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav ffmpeg version 1.1.git Copyright (c) 2000-2013 the FFmpeg developers built on Mar 4 2013 11:40:27 with Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --cc=cc --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-libfreetype --enable-ffplay libavutil 52. 17.103 / 52. 17.103 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.102 / 54. 63.102 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 41.100 / 3. 41.100 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2013-03-04 21:40:01 Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Video Media Handler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler File 'test100_audio.wav' already exists. Overwrite ? [y/N] y Output #0, wav, to 'test100_audio.wav': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 ISFT : Lavf54.63.102 Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler Stream mapping: Stream #0:1 -> #0:0 (aac -> pcm_s16le) Press [q] to stop, [?] for help size= 2344kB time=00:00:12.50 bitrate=1536.1kbits/s video:0kB audio:2344kB subtitle:0 global headers:0kB muxing overhead 0.003333%
comment:4 follow-up: ↓ 6 Changed 8 years ago by cehoyos
- Keywords regression added
- Priority changed from normal to important
If there is an issue, it is a regression since 1edea05
Could you explain how you know that the delay is a bug? What is the other application you are testing?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac? I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?
comment:5 Changed 8 years ago by cehoyos
The nero binary decoder shows the same delay as FFmpeg.
comment:6 in reply to: ↑ 4 ; follow-up: ↓ 7 Changed 8 years ago by brchapman
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
What is the other application you are testing?
Adobe After Effects
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?
After Effects decodes it without the delay.
However, I just noticed that when exporting a wave file from the mp4 using Quicktime 7 Pro, it's audio delay was reversed. The Quicktime's audio is 2ms ahead of the source, whereas ffmpeg's audio is 2ms behind the source.
If there is an alternate solution using command line flags, I'd be up for that as well. Just trying to figure out how to make sure the source and output audio lines up exactly.
comment:7 in reply to: ↑ 6 ; follow-up: ↓ 8 Changed 8 years ago by cehoyos
Replying to brchapman:
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Since it is a regression since that revision, you will have to test an earlier version;-)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 8 years ago by brchapman
Replying to cehoyos:
Replying to brchapman:
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Since it is a regression since that revision, you will have to test an earlier version;-)
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay, and the other ticket about a duplicate first frame I posted #2324 is also fixed!
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 8 years ago by cehoyos
Replying to brchapman:
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,
and the other ticket about a duplicate first frame I posted #2324 is also fixed!
No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime? are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects?, you see the same delay as when transcoding the original mp4 file, or am I wrong?
comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 8 years ago by brchapman
Replying to cehoyos:
Replying to brchapman:
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,
and the other ticket about a duplicate first frame I posted #2324 is also fixed!
No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime? are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects?, you see the same delay as when transcoding the original mp4 file, or am I wrong?
yes, if i first transcode the orignal
% ffmpeg -i test100.mp4 -c:a copy test100.aac
then:
% ffmpeg -i test100.aac -c:a pcm_s16le test100_audio.wav
test100_audio.wav is delayed.
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov
I don't get the duplicate first frame bug in #2324
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov
However, it doesn't. The first frame is still duplicated.
comment:11 in reply to: ↑ 10 Changed 8 years ago by cehoyos
Replying to brchapman:
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.movI don't get the duplicate first frame bug in #2324
You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.movHowever, it doesn't. The first frame is still duplicated.
Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.
comment:12 follow-up: ↓ 13 Changed 8 years ago by cehoyos
I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
comment:13 in reply to: ↑ 12 ; follow-up: ↓ 14 Changed 8 years ago by brchapman
Replying to cehoyos:
Replying to brchapman:
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.movI don't get the duplicate first frame bug in #2324
You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.movHowever, it doesn't. The first frame is still duplicated.
Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.
So when I use -vsync 0, the first frame isn't duplicated, but rather the first frame is now completely black. Any other flags I can use to get rid of this?
I'd use -ss to skip past the first frame, which works on it's own. However if I try to use it with -filter_complex overlay and an image sequence that's overlaid on top of the source video, the sequence doesn't end up starting until frame 2 (frame 1 on screen). Here's that command:
% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 -start_number 1 -i test100_hu ffmpeg version N-37747-g058e1f8 Copyright (c) 2000-2013 the FFmpeg developers built on Mar 5 2013 19:38:09 with llvm-gcc 4.2.1 (LLVM build 2336.11.00) configuration: --prefix=/usr/local/ --enable-shared --enable-pthreads --enable-gpl libavutil 52. 17.103 / 52. 17.103 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.103 / 54. 63.103 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 42.103 / 3. 42.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2013-03-04 21:40:01 Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Video Media Handler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler [image2 @ 0x7fe4d9033c00] max_analyze_duration 5000000 reached at 5000000 microseconds Input #1, image2, from 'test100_hud/test100_transcoder%05d.png': Duration: 00:00:12.50, start: 0.000000, bitrate: N/A Stream #1:0: Video: png, rgba, 1280x720, 24 fps, 24 tbr, 24 tbn, 24 tbc [prores @ 0x7fe4d9466400] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946a800] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946d000] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946f800] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d9038000] encoding with ProRes standard (apcn) profile Output #0, mov, to 'test100_ffmpeg.mov': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 encoder : Lavf54.63.103 Stream #0:0: Video: prores (apcn) (apcn / 0x6E637061), yuv422p10le, 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 12288 tbn, 24 tbc Stream mapping: Stream #0:0 (h264) -> overlay:main Stream #1:0 (png) -> overlay:overlay overlay -> Stream #0:0 (prores) Press [q] to stop, [?] for help frame= 300 fps= 34 q=0.0 Lsize= 8944kB time=00:00:12.50 bitrate=5861.8kbits/s video:8942kB audio:0kB subtitle:0 global headers:0kB muxing overhead 0.021776% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 15.18s user 0.24s system 174% cpu 8.838 total
Looking at mediainfo for test100.mp4, I can see that the video track is cfr, where as the audio is variable. Which I'm guessing causes the "Overall bit rate mode" to become variable. Is this what your talking about?
% mediainfo test100.mp4 General Complete name : test100.mp4 Format : MPEG-4 Format profile : Base Media / Version 2 Codec ID : mp42 File size : 433 KiB Duration : 12s 500ms Overall bit rate mode : Variable Overall bit rate : 284 Kbps Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:41:11 ?TIM : 00:00:00:00 ?TSC : 24 ?TSZ : 1 Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : Main@L5.1 Format settings, CABAC : Yes Format settings, ReFrames : 3 frames Format settings, GOP : M=4, N=33 Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 12s 500ms Bit rate : 85.5 Kbps Width : 1 280 pixels Height : 720 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 24.000 fps Standard : NTSC Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.004 Stream size : 130 KiB (30%) Language : English Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:40:01 Audio ID : 2 Format : AAC Format/Info : Advanced Audio Codec Format profile : LC Codec ID : 40 Duration : 12s 500ms Source duration : 12s 501ms Bit rate mode : Variable Bit rate : 192 Kbps Maximum bit rate : 329 Kbps Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 KHz Compression mode : Lossy Stream size : 289 KiB (67%) Source stream size : 289 KiB (67%) Language : English Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:40:01
Replying to cehoyos:
I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
Gongs start on 0, 41, 84, 116, 167, 207, 246, 285
comment:14 in reply to: ↑ 13 Changed 8 years ago by cehoyos
Replying to brchapman:
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
Gongs start on 0, 41, 84, 116, 167, 207, 246, 285
This is exactly what I see here with different players, both with the original and a sample re-encoded by FFmpeg, if FFmpeg would cut the first 0.02 seconds of audio, this would get wrong or do I miss something?
comment:15 Changed 8 years ago by Cigaes
0.02 seconds of audio is about half a video frame.
And 0.002 seconds, as stated in the title of this ticket, is the audio delay you get by lounging on the couch instead of sitting straight (70 cm more for the sound to travel).
comment:16 Changed 5 years ago by rmk
I have run into this as well and the way I see it, ffmpeg reads but does not write the custom mov metadata (udta/meta) "iTunSMPB". Check around line 3121 in mov.c where priming is set based on the values from that metadata entry. It the mov muxer wrote that metadata, ffmpeg would at least be compatible with itself as far as priming is concerned (i.e. not decode an offset) and with Apple tools. That would be an improvement. I don't know if trailing samples are treated in any way in ffmpeg but AFAIC the priming problem is the more serious one as it leads to a/v desync when the delay is long enough (2112 samples is what Apple uses by default and that's a bit more than a 25 FPS video frame and that's noticable/significant for some people).
For further explanations see also http://ffmpeg.org/pipermail/ffmpeg-devel/2012-July/127834.html
Sample Screenshot of waveform