Opened 12 years ago

Last modified 6 months ago

#2325 open defect

AAC audio delayed ~ 20 ms after conversion to PCM

Reported by: BChap Owned by:
Priority: important Component: avcodec
Version: git-master Keywords: aac regression
Cc: MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

When using ffmpeg to convert an aac audio stream from an mp4 to pcm, the result is out of sync by about 2ms. By adding -ss 00:00:00.02 after the input, then output is correctly aligned.

How to reproduce:

% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav
ffmpeg version 1.1.git
git revision: faa0068
built on Mar  4 2013 11:40:27

Attachments (2)

test100.mp4 (433.1 KB ) - added by BChap 12 years ago.
Screen Shot 2013-03-04 at 8.05.40 PM.png (44.1 KB ) - added by BChap 12 years ago.
Sample Screenshot of waveform

Download all attachments as: .zip

Change History (38)

by BChap, 12 years ago

Attachment: test100.mp4 added

by BChap, 12 years ago

Sample Screenshot of waveform

comment:1 by BChap, 12 years ago

Version: unspecifiedgit-master

comment:2 by Carl Eugen Hoyos, 12 years ago

Keywords: mov added; mp4 pcm removed

Please provide your failing command line together with complete, uncut console output to make this a valid ticket.

comment:3 by BChap, 12 years ago

The command doesn't fail, but here's the output

% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav                          ffmpeg version 1.1.git Copyright (c) 2000-2013 the FFmpeg developers  built on Mar  4 2013 11:40:27 with Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --cc=cc --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-libfreetype --enable-ffplay
  libavutil      52. 17.103 / 52. 17.103
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.102 / 54. 63.102
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 41.100 /  3. 41.100
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    creation_time   : 2013-03-04 21:40:01
  Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Video Media Handler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
File 'test100_audio.wav' already exists. Overwrite ? [y/N] y
Output #0, wav, to 'test100_audio.wav':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    ISFT            : Lavf54.63.102
    Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
Stream mapping:
  Stream #0:1 -> #0:0 (aac -> pcm_s16le)
Press [q] to stop, [?] for help
size=    2344kB time=00:00:12.50 bitrate=1536.1kbits/s    
video:0kB audio:2344kB subtitle:0 global headers:0kB muxing overhead 0.003333%

comment:4 by Carl Eugen Hoyos, 12 years ago

Keywords: regression added
Priority: normalimportant

If there is an issue, it is a regression since 1edea05

Could you explain how you know that the delay is a bug? What is the other application you are testing?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac? I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?

comment:5 by Carl Eugen Hoyos, 12 years ago

The nero binary decoder shows the same delay as FFmpeg.

in reply to:  4 ; comment:6 by BChap, 12 years ago

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

What is the other application you are testing?

Adobe After Effects

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?

After Effects decodes it without the delay.
However, I just noticed that when exporting a wave file from the mp4 using Quicktime 7 Pro, it's audio delay was reversed. The Quicktime's audio is 2ms ahead of the source, whereas ffmpeg's audio is 2ms behind the source.

If there is an alternate solution using command line flags, I'd be up for that as well. Just trying to figure out how to make sure the source and output audio lines up exactly.

Last edited 12 years ago by BChap (previous) (diff)

in reply to:  6 ; comment:7 by Carl Eugen Hoyos, 12 years ago

Replying to brchapman:

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Since it is a regression since that revision, you will have to test an earlier version;-)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

in reply to:  7 ; comment:8 by BChap, 12 years ago

Replying to cehoyos:

Replying to brchapman:

Replying to cehoyos:

If there is an issue, it is a regression since 1edea05

Just tried pulling that revision, and building. I still get the same delay.

Since it is a regression since that revision, you will have to test an earlier version;-)

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay, and the other ticket about a duplicate first frame I posted #2324 is also fixed!

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

in reply to:  8 ; comment:9 by Carl Eugen Hoyos, 12 years ago

Replying to brchapman:

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,

and the other ticket about a duplicate first frame I posted #2324 is also fixed!

No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects, you see the same delay as when transcoding the original mp4 file, or am I wrong?

in reply to:  9 ; comment:10 by BChap, 12 years ago

Replying to cehoyos:

Replying to brchapman:

Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,

and the other ticket about a duplicate first frame I posted #2324 is also fixed!

No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)

Could you explain how you know that the delay is a bug?

When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.

How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)

I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?

Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)

Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?

No, I don't get the delay. It lines up perfectly.

You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?

When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.

And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects, you see the same delay as when transcoding the original mp4 file, or am I wrong?

yes, if i first transcode the orignal

% ffmpeg -i test100.mp4 -c:a copy test100.aac

then:

% ffmpeg -i test100.aac -c:a pcm_s16le test100_audio.wav

test100_audio.wav is delayed.

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324
Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

in reply to:  10 comment:11 by Carl Eugen Hoyos, 12 years ago

Replying to brchapman:

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324

You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).

Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.

comment:12 by Carl Eugen Hoyos, 12 years ago

I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

in reply to:  12 ; comment:13 by BChap, 12 years ago

Replying to cehoyos:

Replying to brchapman:

Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:

% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov

I don't get the duplicate first frame bug in #2324

You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).

Based on this I would guess that this would work:

ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov

However, it doesn't. The first frame is still duplicated.

Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.

So when I use -vsync 0, the first frame isn't duplicated, but rather the first frame is now completely black. Any other flags I can use to get rid of this?

I'd use -ss to skip past the first frame, which works on it's own. However if I try to use it with -filter_complex overlay and an image sequence that's overlaid on top of the source video, the sequence doesn't end up starting until frame 2 (frame 1 on screen). Here's that command:

% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 -start_number 1 -i test100_hu

ffmpeg version N-37747-g058e1f8 Copyright (c) 2000-2013 the FFmpeg developers
  built on Mar  5 2013 19:38:09 with llvm-gcc 4.2.1 (LLVM build 2336.11.00)
  configuration: --prefix=/usr/local/ --enable-shared --enable-pthreads --enable-gpl
  libavutil      52. 17.103 / 52. 17.103
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.103 / 54. 63.103
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 42.103 /  3. 42.103
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    creation_time   : 2013-03-04 21:40:01
  Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Video Media Handler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s
    Metadata:
      creation_time   : 2013-03-04 21:40:01
      handler_name    : Mainconcept MP4 Sound Media Handler
[image2 @ 0x7fe4d9033c00] max_analyze_duration 5000000 reached at 5000000 microseconds
Input #1, image2, from 'test100_hud/test100_transcoder%05d.png':
  Duration: 00:00:12.50, start: 0.000000, bitrate: N/A
    Stream #1:0: Video: png, rgba, 1280x720, 24 fps, 24 tbr, 24 tbn, 24 tbc
[prores @ 0x7fe4d9466400] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946a800] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946d000] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d946f800] encoding with ProRes standard (apcn) profile
[prores @ 0x7fe4d9038000] encoding with ProRes standard (apcn) profile
Output #0, mov, to 'test100_ffmpeg.mov':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    encoder         : Lavf54.63.103
    Stream #0:0: Video: prores (apcn) (apcn / 0x6E637061), yuv422p10le, 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 12288 tbn, 24 tbc
Stream mapping:
  Stream #0:0 (h264) -> overlay:main
  Stream #1:0 (png) -> overlay:overlay
  overlay -> Stream #0:0 (prores)
Press [q] to stop, [?] for help
frame=  300 fps= 34 q=0.0 Lsize=    8944kB time=00:00:12.50 bitrate=5861.8kbits/s    
video:8942kB audio:0kB subtitle:0 global headers:0kB muxing overhead 0.021776%
ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24  15.18s user 0.24s system 174% cpu 8.838 total

Looking at mediainfo for test100.mp4, I can see that the video track is cfr, where as the audio is variable. Which I'm guessing causes the "Overall bit rate mode" to become variable. Is this what your talking about?

% mediainfo test100.mp4                                                                                 
General
Complete name                            : test100.mp4
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42
File size                                : 433 KiB
Duration                                 : 12s 500ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 284 Kbps
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:41:11
?TIM                                     : 00:00:00:00
?TSC                                     : 24
?TSZ                                     : 1

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L5.1
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 3 frames
Format settings, GOP                     : M=4, N=33
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 12s 500ms
Bit rate                                 : 85.5 Kbps
Width                                    : 1 280 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 24.000 fps
Standard                                 : NTSC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.004
Stream size                              : 130 KiB (30%)
Language                                 : English
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:40:01

Audio
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 12s 500ms
Source duration                          : 12s 501ms
Bit rate mode                            : Variable
Bit rate                                 : 192 Kbps
Maximum bit rate                         : 329 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 289 KiB (67%)
Source stream size                       : 289 KiB (67%)
Language                                 : English
Encoded date                             : UTC 2013-03-04 21:40:01
Tagged date                              : UTC 2013-03-04 21:40:01

Replying to cehoyos:

I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

Gongs start on 0, 41, 84, 116, 167, 207, 246, 285

in reply to:  13 comment:14 by Carl Eugen Hoyos, 12 years ago

Replying to brchapman:

At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?

Gongs start on 0, 41, 84, 116, 167, 207, 246, 285

This is exactly what I see here with different players, both with the original and a sample re-encoded by FFmpeg, if FFmpeg would cut the first 0.02 seconds of audio, this would get wrong or do I miss something?

comment:15 by Cigaes, 12 years ago

0.02 seconds of audio is about half a video frame.

And 0.002 seconds, as stated in the title of this ticket, is the audio delay you get by lounging on the couch instead of sitting straight (70 cm more for the sound to travel).

comment:16 by rmk, 9 years ago

I have run into this as well and the way I see it, ffmpeg reads but does not write the custom mov metadata (udta/meta) "iTunSMPB". Check around line 3121 in mov.c where priming is set based on the values from that metadata entry. It the mov muxer wrote that metadata, ffmpeg would at least be compatible with itself as far as priming is concerned (i.e. not decode an offset) and with Apple tools. That would be an improvement. I don't know if trailing samples are treated in any way in ffmpeg but AFAIC the priming problem is the more serious one as it leads to a/v desync when the delay is long enough (2112 samples is what Apple uses by default and that's a bit more than a 25 FPS video frame and that's noticable/significant for some people).

For further explanations see also http://ffmpeg.org/pipermail/ffmpeg-devel/2012-July/127834.html

comment:17 by Balling, 3 years ago

Status: newopen

ffmpeg can write a pgap inside the track's udta via -metadata:s:a gapless_playback=X where X (P.S. wrong) is an 8-bit value.

P.S. No, that is just a flag. Oh, there is a Remainder in seamless playback (iTunSMPB). See https://bugs.chromium.org/p/chromium/issues/detail?id=668999 and https://stackoverflow.com/questions/31093736/removing-both-leading-and-trailing-silence-from-m4a-files-using-ffmpeg

And https://stackoverflow.com/questions/33401149/ffmpeg-adding-0-05-seconds-of-silence-to-transcoded-aac-file

Full description of iTunSMPB https://hydrogenaud.io/index.php?topic=48231.msg430949#msg430949
Spec from apple. https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html
"Additionally, the sixth value is the byte offset from the first audio frame to the 8th-from-last frame. This provides a resynchronization mechanism to restore a decoder's true sample number after a seek."

Apparently a real problem, wow. https://www.reddit.com/r/PrismMusic/comments/a9r7h5/comment/ecmou6i/

Last edited 3 years ago by Balling (previous) (diff)

comment:18 by Balling, 3 years ago

iTunSMPB is available for mp3 too! https://patchwork.ffmpeg.org/project/ffmpeg/list/?submitter=41

Not applied, yet:

For example exiftool on (yes, those are already in FATE, yet not in USE, LOL) https://fate-suite.ffmpeg.org/gapless/gapless-itunes.mp3

shows

Encoded By                      : iTunes 12.7.0.166
Comment                         : (iTunSMPB)  00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D 00000000 00000000 00000000 00000000 00000000 00000000

and exiftool -v

Comment = (iTunPGAP) 0
  | EncodedBy = iTunes 12.7.0.166
  | Comment = (iTunNORM)  00000362 000004C0 0000308F 00003CC5 00000DAC 00000DAC 00007D1[snip]
  | Comment = (iTunSMPB)  00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D[snip]

and ffmpeg itself:

    encoded_by      : iTunes 12.7.0.166
    iTunNORM        :  00000362 000004C0 0000308F 00003CC5 00000DAC 00000DAC 00007D14 00007AC9 000007C1 0000175E
    iTunSMPB        :  00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D 00000000 00000000 00000000 00000000 00000000 00000000

Which means: 528 (0x210) samples of priming, 2154 (0x86A) of Remainder samples, 418950 (0x66486) samples total (that obviously does not include garbage samples 528 + 2154) and 0x2DA9D is the byte offset from the first audio frame to the 8th-from-last frame. So alltogether there are 421 632 samples. Same is in audacity when you open wav decoded in ffmpeg! Very bad.

What is even more funny is that it prints those warnings:

[mp3float @ 000001545cdc72c0] overread, skip -6 enddists: -4 -4A
[mp3float @ 000001545cdc72c0] overread, skip -5 enddists: -2 -2

Some implementation of it: https://git.rockbox.org/cgit/rockbox.git/log/?qt=grep&q=gapless

Last edited 3 years ago by Balling (previous) (diff)

comment:19 by Balling, 3 years ago

I just found a mp4/aac file that has crazy 10 116 priming samples!! WOW. It is "Advanced Audio Codec with Spectral Band Replication".

It has media time set to 5058, and you do x2 because it is HC since mdhd_TimeScale (?). Nice, just nice, I suppose you should look into that too. File Abba - Don't Shut Me Down.mp4 from Tidal. It does have sgpd and sbgp though, all good but another file Abba - I Still Have Faith In You.mp4 has 310720 ms duration which is BS, because it is actually (after discarding priming) 8 ms longer than the audio. WTF. mdhd.duration is also different after -c copy. Oogh.

Last edited 2 years ago by Balling (previous) (diff)

comment:20 by Balling, 3 years ago

comment:21 by Bartek Zdanowski, 3 years ago

I've run at this problem a few days ago.

From my perspective it's a problem with aac at FFmpeg's side. Not from the side of the codec.

I generate 1 frame (1024) of silence or sinus samples but ffmpeg adds about two frames o silence before my frame.
Same happens when I push packets of existing audio from some source. FFmpeg adds two frames of silence in the beginning.

No such thing happens for other codecs. Checked with ac3 and eac3. Audio starts correctly from first sample. No silence added in the beginning.

I'm absolutely sure about my data I feed into ffmpeg so please don't argue.

There's some kind of problem with ffmpeg-aac code. I've tested is with

  • aac builtin
  • libfdk-aac

BOTH have a problem with adding about two frames of silence in front of passed audio.

comment:22 by Bartek Zdanowski, 3 years ago

OK. I know all :)

Some codes needs ramp-up samples. Some of them plenty, for example 2048 or more which can be more than one frame. It depends on the codec.
Those samples are added in front of data that comes from the codec.
One can check how many samples are added via
VCodecContext::initial_padding;

Those samples need to be discarded!

This can be easily done if you're using ffmpeg as a library. But from command line this (to my best knowledge) cannot be discarded. If somebody knows a way - please add a comment.

I've came up with a solution if initial_padding is different that a multiply of a frame size. You have to add missing samples to fill up to next full frame and discard those frames. Now you can push real audio data and enjoy output without latency and padding.

This can be achieved only using ffmpeg as a library.

comment:23 by Balling, 2 years ago

mdhd.duration is also different after -c copy. Oogh.

It is actually the same, while post-editlist durations (mvhd and tkhd) become wrong, 238915 vs 238800.

Anyway after #9671 looks like decoding HE-AAC to wav is more correct, i.e. it does not cut off too much good sound (after priming) in the beginning. By "looks like" I mean I compared to Don't Shut Me Down.flac that has 3m 58s 800ms (to get 800 ms info I used Medianfo in "advanced mode"), while after #9671 you get when decoding HE-AAC to WAV 3m 58s 819ms but original aac file in Mediainfo says -19 ms should be done (Duration_LastFrame: -19 ms) due to editlist duration. And that way both are the same. We really need to do https://bugs.chromium.org/p/chromium/issues/detail?id=668999

Last edited 6 months ago by Balling (previous) (diff)

in reply to:  20 comment:24 by MasterQuestionable, 6 months ago

Cc: MasterQuestionable added
Component: undeterminedavcodec
Keywords: mov removed
Summary: MP4 AAC Audio is delayed by 2ms when converted to PCMAAC audio delayed ~ 20 ms after conversion to PCM

͏    I don't think the "mp3dec" patch could be directly related.
͏    Though the gapless metadata thing could somewhat relate.

͏    However I much question the necessity of such metadata implementation: much useless.
͏    Shall the player be able to make use of such: so shall they be able to directly pre-filter the audio outright, without needing such metadata.
͏    (that shall eventually give nothing meaningful: for what's needed to support gapless playback may be programmatically acquired by silence detection)


͏    That causes the ~ 20 ms offset is primarily for:
͏    https://developer.apple.com/documentation/quicktime-file-format/background_aac_encoding
͏    https://en.wikipedia.org/wiki/Gapless_playback#Compression_artifacts
͏    .
͏    Unsure about the appropriate handling really:
͏    Technically, these are part of the samples... whether sensible or not.

comment:25 by Balling, 6 months ago

Same code will have to be written for mp4, yes. Not directly related, but file for this is in FATE.

comment:26 by MasterQuestionable, 6 months ago

͏    I doubt such adding could address the real problems.
͏    And only MP4? (ADTS AAC, MKV etc..?)

comment:27 by Balling, 6 months ago

And only MP4?

In fact only Apple style mp4. This will not fix new editlist ISO style, there remainder needs to be fixed only, patch is here that applies duration field in editlist and thus removes remainder: https://patchwork.ffmpeg.org/project/ffmpeg/patch/20190429225027.81295-1-fumoboy007@me.com/

ADTS

Cannot be solved. It only has the author of the audio stream, not the priming info. Same for mkv.

comment:28 by MasterQuestionable, 6 months ago

͏    Hard to tell if it's really fix... Or breaking things further.
͏    Comparable to previous PNG havoc. [ #11002 ]

comment:29 by Balling, 6 months ago

Well, actually it is very simple just as with png. Apple decoder decodes as I said, aac mp4 and mp3. Chrome uses remainder fix for ffmpeg, here is a test for that, https://jakearchibald.github.io/aac-decode-bug/

As for the mp3dec patch, it is used by the author of the patch here and file in FATE also is his. https://github.com/losnoco/Cog/tree/main/ThirdParty/ffmpeg/patches

Last edited 6 months ago by Balling (previous) (diff)

comment:30 by MasterQuestionable, 6 months ago

͏    The primary issue is broken inconsistently across implementations.
͏    (as in [ https://trac.ffmpeg.org/ticket/11002#comment:11 ])

͏    And logically there's no strong reason to justify such unusual display by default:
͏    These are part of the samples: due to compression artifacts, still.

͏    Taking it into an analogy, that would be:
͏    The decoder deeming part of the frames ugly and refuses to display...

Last edited 6 months ago by MasterQuestionable (previous) (diff)

comment:31 by Balling, 6 months ago

These are part of the samples: due to compression artifacts, still.

You are correct! The remainder, which is after 5 seconds is not always just silence. It can be some strange decreasing amplitude noise, like in the case of perfect audio sinusoid. But also see my report #9471. Paul that helped with Dolby EAC3 left the ffmpeg project too. So WE have mp3, mp4, eac3. Apple actually handles EAC3 correctly too.

comment:32 by MasterQuestionable, 6 months ago

͏    Any non-broken codec should not cause mere silence be significantly translated to anything else more sophisticated (less compressible).
͏    Those have strange leading/trailing patterns are typically recording noise unhandled: preferably to be dealt from source.

Last edited 6 months ago by MasterQuestionable (previous) (diff)

in reply to:  32 comment:33 by Balling, 6 months ago

Replying to MasterQuestionable:

Any non-broken codec should not cause mere silence be significantly

Think about it, all codecs have frames or something like that. And thus after the end of audio it will have artifical sound. E.g. truehd uses shorten_by to remove trailing silence. And flac also has some metadata.

comment:34 by MasterQuestionable, 6 months ago

͏    "significantly" refers to no more than 1 audio frame: for some codecs' overlapping transformation.
͏    Otherwise, definitely broken.

comment:35 by Balling, 6 months ago

Haha, yes, multiple frames by Apple aac is funny and different amount used in HE aac and again different in HEv2 is even funnier. Not to mention EAC3 having different priming in different encoders.

You would love to learn that aac can have not 1024 but 960 samples and that is used in DAB+ digital radio. #1407

In both cases standards suggest minimal amount: 1024 and 256 to be used. LOL.

Note: See TracTickets for help on using tickets.