Opened 5 years ago

Last modified 10 months ago

#7828 open defect

gapless playback doesn't work with AAC (remainder and Apple style)

Reported by: Christoph Anton Mitterer Owned by: Elon Musk
Priority: normal Component: undetermined
Version: git-master Keywords: aac gapless
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

With current git master ffmpeg 1125277:

In the following a series of test files, based on https://commons.wikimedia.org/w/index.php?title=File%3ATelemann_-_2violin_Sonata_1-1.ogg, are used.

Each filename starts with a number, where the same number indicates the files belong together.

00.test.wav

PCM WAV shortened version of the above Wikipedia Demo

01.split-track01.wav
01.split-track02.wav

00.test.wav split into two halfs (these are the actual base test files used with encoders)

02.split-track01.mp3
02.split-track02.mp3

LAME encoded versions of 01.split-track01.wav and 01.split-track02.wav

03.split-track01.opus
03.split-track02.opus

opusenc encoded versions of 01.split-track01.wav and 01.split-track02.wav

and so on.

1) How the to base test files (01.split-track01.wav and 01.split-track02.wav) were created
The Wikipedia demo file was first decoded to PCM WAV with opusdec, and split in two halfs with

$ shnsplit 00.test.wav
enter split points:
0:05.317
shnsplit: warning: rounding 0:05.317 (offset: 937919) to nearest sector boundary (offset: 938448)
shnsplit: warning: file 2 will not be cut on a sector boundary
Splitting [test.wav] (0:15.66) --> [01.split-track01.wav] (0:05.24) : 100% OK
Splitting [test.wav] (0:15.66) --> [01.split-track02.wav] (0:10.42) : 100% OK

For cross checking, the resulting files were joined again:
$ sox 01.split-track01.wav 01.split-track02.wav joined.wav

The concatenation is binary identical to the original file:
$ diff 00.test.wav joined.wav
$
which can also be seen (visually) in e.g. audacity or sonic-visualizer (i.e. there are no gaps or other distortions between 01.split-track01.wav and 01.split-track02.wav.

2) What is tested?
The split files will now be encoded with some reference encoders and played respectively decoded (to PCM WAV) again afterwards checking for the following:

  • Does the "gaplass" playback even work for the plain PCM WAV?
  • At playback, can any gap, crack, pop, etc. be heared between the two files (i.e. does "gapless playback" work)?
  • At decoding to PCM WAV, is there any shift at the start of the 1st file respectively end of the 2nd file?
  • At decoding to PCM WAV, is there any gap/shift/other distortion at the end of the 1st file and start of the 2nd file when these two are concatenated, in other words at the joining position?

Hearing tests were repeated multiple times, so the files were already in the OS cache and one should basically expect no delay at all from slow storage medium (which was anyway one of the fastest SSDs)
Unless otherwise noticed, all programs libraries were from Debian unstable.

Encoders with these options were used:

  • lame --verbose -q 0 -v -V 4 --noreplaygain --id3v2-utf16 --add-id3v2 --id3v1-only LAME 64bits version 3.100
  • opusenc --vbr --bitrate 96 split-track01.wav opus-tools 0.1.10
  • fdkaac -p 29 -m 4 <gapless modes, part of the filename> 0.6.3 gapless modes: 0 iTunSMPB 1 ISO standard (edts and sgpd) 2 Both
  • aac-enc -t 29 -v 4 0.1.6 => may not even set any gapless information, so can possibly completely ignored
  • faac -q 100 -w 1.29.9.2 => may not even set any gapless information, so can possibly completely ignored

And for playback respectively decoding:
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
configuration:
libavutil 56. 26.100 / 56. 26.100
libavcodec 58. 48.100 / 58. 48.100
libavformat 58. 26.101 / 58. 26.101
libavdevice 58. 7.100 / 58. 7.100
libavfilter 7. 48.100 / 7. 48.100
libswscale 5. 4.100 / 5. 4.100
libswresample 3. 4.100 / 3. 4.100

3) Results
a) Hearing Tests

I couldn't use ffplay, because I had to run everything from git-master ffmpeg on another node.
What I did instead was using mpv to playback files decoded (below in (b) with git-master ffmpeg.
mpv 01.split-track0*.wav => OK, no gap/pop/click/etc. (original files, no ffmpeg here)
mpv ffmpeg1125277.02.split-track0*.mp3.wav => OK, no gap/pop/click/etc.
mpv ffmpeg1125277.03.split-track0*.opus.wav => OK, no gap/pop/click/etc.

but all with AAC:
mpv ffmpeg1125277.04.split-track0*.fdkaac.gapless-mode-0.m4a.wav
...
mpv ffmpeg1125277.08.split-track0*.faac.m4a.wav => BAD, clearly audible gap

b) Visible tests

For these, all the encoded files were again decoded with:

ffmpeg -vn -i input.file output.wav
(this and only this was done with the ffmpeg from git master on some *buntu machine).

to PCM WAV files like:

ffmpeg1125277.02.split-track01.mp3.wav
ffmpeg1125277.02.split-track02.mp3.wav

which would be the yet again decoded files used in the visual tests (with audacity/sonic-visualizer), that is e.g. ffmpeg1125277.02.split-track01.mp3.wav would have been decoded with ffmpeg1125277 from 02.split-track01.mp3 .

For each such pair an image is attached, e.g.:

ffmpeg1125277.02.mp3.wav.png

Comparing the intersection point:

top: the original 00.test.wav

middle: the joined ffmpeg1125277.02.split-track01.mp3.wav and ffmpeg1125277.02.split-track02.mp3.wav (named ffmpeg1125277.02.joined.mp3.wav)

(joins were made with sox 1.wav 2.wav 1-joined-with-2.wav)

bottom: ffmpeg1125277.02.split-track01.mp3.wav alone, serving just as reference as to where the intersection is

Opus seems to always sample at 48kHz, so in sonic visualizer there is an option that will do automatic resampling on opening, which I've enabled.

ffmpeg1125277.02.mp3.wav.png => OK, mostly, there might be a small distortion (red circle), but I guess nothing that anyone will be able to hear
ffmpeg1125277.03.opus.wav.png => OK, seems perfect

all the AAC ones:
ffmpeg1125277.*.*.m4a.wav.png => BAD, not only huge gaps, but it seems the as if end and start of the joined files was even like "faded out/in" (no idea whether encoder or decoder error)

In detail:
fdkaac+iTunSMPB: gap + fade in AND out

fdkaac+ISO: gap + fade out

fdkaac+Both: gap + fade out

(mpv seems to always have gap + fade in AND out

aac-enc: gap + fade out AND in

faac: gap + fade out

(same for mpv)

So while we can probably toss aac-enc and faac,... one sees that something is already different with fdkaac depending on the gap detection method (though both have still big gaps).

Long story short:
I would guess that *somewhere* there's a bug with respect to gapless encoding and/or decoding of AAC.
Since fdkaac claims it would support gapless playback, one might assume the error is on ffmpeg's side.

Problem is, I have no encoder/decoder pair for which I know that it works... maybe one could try it with itunes?

I'd be happy to evaluate further, if any developer has an idea how to move on (i.e. how/where to get AAC files which are definitively considered to be correctly encoded for gapless playback and which one can test with ffmpeg), until then I'd assume that the fdkaac created files are in correctly created for gapless playback.

Thanks, Chris.

The test files an images can be found at:
https://drive.google.com/drive/folders/1SIt1z3FtlYa-zMEDzF-m8jsMCe2PFsyz?usp=sharing

FYI: I did the same tests with mpv (however with 4.1.1 ffmpeg):
https://github.com/mpv-player/mpv/issues/2284

Attachments (4)

nonfragmented-aac-stts.mp4 (1.8 KB ) - added by John Regan 10 months ago.
Non-fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, and an stts box signaling the last packet has 768 samples. Should decode to exactly 96000 samples.
fragmented-aac-tfhd.mp4 (1.9 KB ) - added by John Regan 10 months ago.
Fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, the final fragment uses the tfhd box to signal the last packet has 768 samples. Should decode to exactly 96000 samples.
fragmented-aac-trun.mp4 (2.5 KB ) - added by John Regan 10 months ago.
Fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, the final fragment uses the trun box to signal the last packet has 768 samples. Should decode to exactly 96000 samples.
nonfragmented-aac-itunes.mp4 (33.3 KB ) - added by John Regan 10 months ago.
Non-fragmented MP4 file with AAC audio. Uses a custom metadata (iTunSMPB) to signal the priming and padding samples to discard. Should decode to exactly 96000 samples.

Download all attachments as: .zip

Change History (16)

comment:1 by Carl Eugen Hoyos, 5 years ago

Please provide one ffmpeg command line together with the complete, uncut console output to make this a valid ticket.

comment:2 by Christoph Anton Mitterer, 5 years ago

The used ffmpeg command line was already described in the text above, but again for reference:

./ffmpeg -vn -i 02.split-track01.mp3 ffmpeg1125277.02.split-track01.mp3.wav
./ffmpeg -vn -i 02.split-track02.mp3 ffmpeg1125277.02.split-track02.mp3.wav
./ffmpeg -vn -i 03.split-track01.opus ffmpeg1125277.03.split-track01.opus.wav
./ffmpeg -vn -i 03.split-track02.opus ffmpeg1125277.03.split-track02.opus.wav
./ffmpeg -vn -i 04.split-track01.fdkaac.gapless-mode-0.m4a ffmpeg1125277.04.split-track01.fdkaac.gapless-mode-0.m4a.wav
./ffmpeg -vn -i 04.split-track02.fdkaac.gapless-mode-0.m4a ffmpeg1125277.04.split-track02.fdkaac.gapless-mode-0.m4a.wav
./ffmpeg -vn -i 05.split-track01.fdkaac.gapless-mode-1.m4a ffmpeg1125277.05.split-track01.fdkaac.gapless-mode-1.m4a.wav
./ffmpeg -vn -i 05.split-track02.fdkaac.gapless-mode-1.m4a ffmpeg1125277.05.split-track02.fdkaac.gapless-mode-1.m4a.wav
./ffmpeg -vn -i 06.split-track01.fdkaac.gapless-mode-2.m4a ffmpeg1125277.06.split-track01.fdkaac.gapless-mode-2.m4a.wav
./ffmpeg -vn -i 06.split-track02.fdkaac.gapless-mode-2.m4a ffmpeg1125277.06.split-track02.fdkaac.gapless-mode-2.m4a.wav
./ffmpeg -vn -i 07.split-track01.aac-enc.m4a ffmpeg1125277.07.split-track01.aac-enc.m4a.wav
./ffmpeg -vn -i 07.split-track02.aac-enc.m4a ffmpeg1125277.07.split-track02.aac-enc.m4a.wav
./ffmpeg -vn -i 08.split-track01.faac.m4a ffmpeg1125277.08.split-track01.faac.m4a.wav
./ffmpeg -vn -i 08.split-track02.faac.m4a ffmpeg1125277.08.split-track02.faac.m4a.wav

The console output of these is already in the list of files ​https://drive.google.com/drive/folders/1SIt1z3FtlYa-zMEDzF-m8jsMCe2PFsyz

but I can copy&paste it here as well:

ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mp3, from '02.split-track01.mp3':
  Duration: 00:00:05.36, start: 0.025057, bitrate: 156 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 156 kb/s
    Metadata:
      encoder         : LAME3.100
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.02.split-track01.mp3.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.48.100 pcm_s16le
size=     917kB time=00:00:05.32 bitrate=1411.3kbits/s speed= 316x    
video:0kB audio:916kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008312%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mp3, from '02.split-track02.mp3':
  Duration: 00:00:10.61, start: 0.025057, bitrate: 143 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 143 kb/s
    Metadata:
      encoder         : LAME3.100
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.02.split-track02.mp3.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.48.100 pcm_s16le
size=    1819kB time=00:00:10.55 bitrate=1411.3kbits/s speed= 350x    
video:0kB audio:1819kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004188%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
[ogg @ 0x246b400] 657 bytes of comment header remain
Input #0, ogg, from '03.split-track01.opus':
  Duration: 00:00:05.33, start: 0.000000, bitrate: 126 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
    Metadata:
      ENCODER         : opusenc from opus-tools 0.1.10
      ENCODER_OPTIONS : --vbr --bitrate 96
Stream mapping:
  Stream #0:0 -> #0:0 (opus (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.03.split-track01.opus.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      ENCODER_OPTIONS : --vbr --bitrate 96
      encoder         : Lavc58.48.100 pcm_s16le
size=     998kB time=00:00:05.32 bitrate=1536.1kbits/s speed= 163x    
video:0kB audio:998kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.007636%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
[ogg @ 0x2bb9400] 657 bytes of comment header remain
Input #0, ogg, from '03.split-track02.opus':
  Duration: 00:00:10.57, start: 0.000000, bitrate: 122 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
    Metadata:
      ENCODER         : opusenc from opus-tools 0.1.10
      ENCODER_OPTIONS : --vbr --bitrate 96
Stream mapping:
  Stream #0:0 -> #0:0 (opus (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.03.split-track02.opus.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      ENCODER_OPTIONS : --vbr --bitrate 96
      encoder         : Lavc58.48.100 pcm_s16le
size=    1980kB time=00:00:10.55 bitrate=1536.1kbits/s speed= 170x    
video:0kB audio:1980kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.003847%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '04.split-track01.fdkaac.gapless-mode-0.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:17:22.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
    iTunSMPB        :  00000000 00000C00 000005C6 000000000001CA3A 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:05.53, start: 0.069660, bitrate: 46 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 44 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:17:22.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.04.split-track01.fdkaac.gapless-mode-0.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    iTunSMPB        :  00000000 00000C00 000005C6 000000000001CA3A 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:17:22.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=     928kB time=00:00:05.45 bitrate=1393.3kbits/s speed= 279x    
video:0kB audio:928kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008208%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '04.split-track02.fdkaac.gapless-mode-0.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:17:33.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
    iTunSMPB        :  00000000 00000C00 00000281 0000000000038D7F 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:10.73, start: 0.069660, bitrate: 42 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 40 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:17:33.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.04.split-track02.fdkaac.gapless-mode-0.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    iTunSMPB        :  00000000 00000C00 00000281 0000000000038D7F 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:17:33.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=    1824kB time=00:00:10.65 bitrate=1402.0kbits/s speed= 217x    
video:0kB audio:1824kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004176%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '05.split-track01.fdkaac.gapless-mode-1.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:18:03.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
  Duration: 00:00:05.53, start: 0.000000, bitrate: 46 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 44 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:03.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.05.split-track01.fdkaac.gapless-mode-1.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:03.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=     920kB time=00:00:05.34 bitrate=1411.3kbits/s speed= 265x    
video:0kB audio:920kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008280%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '05.split-track02.fdkaac.gapless-mode-1.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:18:10.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
  Duration: 00:00:10.73, start: 0.000000, bitrate: 42 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 40 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:10.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.05.split-track02.fdkaac.gapless-mode-1.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:10.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=    1824kB time=00:00:10.58 bitrate=1411.3kbits/s speed= 212x    
video:0kB audio:1824kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004176%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '06.split-track01.fdkaac.gapless-mode-2.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:18:28.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
    iTunSMPB        :  00000000 00000C00 000005C6 000000000001CA3A 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:05.53, start: 0.000000, bitrate: 46 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 44 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:28.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.06.split-track01.fdkaac.gapless-mode-2.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    iTunSMPB        :  00000000 00000C00 000005C6 000000000001CA3A 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:28.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=     920kB time=00:00:05.34 bitrate=1411.3kbits/s speed= 268x    
video:0kB audio:920kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008280%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '06.split-track02.fdkaac.gapless-mode-2.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:18:29.000000Z
    encoder         : fdkaac 0.6.3, libfdk-aac 3.4.22, VBR mode 4
    iTunSMPB        :  00000000 00000C00 00000281 0000000000038D7F 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:10.73, start: 0.000000, bitrate: 42 kb/s
    Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 40 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:29.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.06.split-track02.fdkaac.gapless-mode-2.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    iTunSMPB        :  00000000 00000C00 00000281 0000000000038D7F 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.26.101
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:18:29.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=    1824kB time=00:00:10.58 bitrate=1411.3kbits/s speed= 214x    
video:0kB audio:1824kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004176%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
[aac @ 0x28bb400] Estimating duration from bitrate, this may be inaccurate
Input #0, aac, from '07.split-track01.aac-enc.m4a':
  Duration: 00:00:05.88, bitrate: 43 kb/s
    Stream #0:0: Audio: aac (HE-AACv2), 44100 Hz, stereo, fltp, 43 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.07.split-track01.aac-enc.m4a.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.48.100 pcm_s16le
size=     952kB time=00:00:05.52 bitrate=1411.3kbits/s speed= 278x    
video:0kB audio:952kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008001%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
[aac @ 0x2a3b400] Estimating duration from bitrate, this may be inaccurate
Input #0, aac, from '07.split-track02.aac-enc.m4a':
  Duration: 00:00:09.31, bitrate: 48 kb/s
    Stream #0:0: Audio: aac (HE-AACv2), 44100 Hz, stereo, fltp, 48 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.07.split-track02.aac-enc.m4a.wav':
  Metadata:
    ISFT            : Lavf58.26.101
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.48.100 pcm_s16le
size=    1848kB time=00:00:10.72 bitrate=1411.3kbits/s speed= 212x    
video:0kB audio:1848kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004122%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '08.split-track01.faac.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:25:17.000000Z
    encoder         : FAAC 1.29.9.2
  Duration: 00:00:05.32, start: 0.000000, bitrate: 100 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 98 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:25:17.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.08.split-track01.faac.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    ISFT            : Lavf58.26.101
    Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:25:17.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=     920kB time=00:00:05.36 bitrate=1405.2kbits/s speed= 434x    
video:0kB audio:920kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008280%
ffmpeg version N-93527-g1125277 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
  configuration: 
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '08.split-track02.faac.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2019-04-03T16:25:30.000000Z
    encoder         : FAAC 1.29.9.2
  Duration: 00:00:10.56, start: 0.000000, bitrate: 97 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:25:30.000000Z
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ffmpeg1125277.08.split-track02.faac.m4a.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    ISFT            : Lavf58.26.101
    Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
    Metadata:
      creation_time   : 2019-04-03T16:25:30.000000Z
      encoder         : Lavc58.48.100 pcm_s16le
size=    1820kB time=00:00:10.58 bitrate=1408.2kbits/s speed= 492x    
video:0kB audio:1820kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004185%

comment:3 by Christoph Anton Mitterer, 5 years ago

I found a set of test vector AAC files, that are ought to play back gapless (by using meta data according to MPEG standards,.. not the iTunes stuff which - of course (Apple) - does it different):
https://www2.iis.fraunhofer.de/AAC/gapless.html

(turn down your volume as these files have a pretty loud and annoying sound, IMO)

When decoding these files with FFMPEG to e.g. WAV, there is a "gap" added at the end (just as described in the website above)... interestingly,... AFAIU the website, decoders "not doing it right" would also add a "gap" at the beginning, but it seems this works properly with ffmpeg (but maybe I'm doing something wrong here).

comment:4 by Christoph Anton Mitterer, 5 years ago

Also note that the Fraunhofer website contains a link to Apples documentation:
https://developer.apple.com/library/archive/technotes/tn2258/_index.html#//apple_ref/doc/uid/DTS40009396

Maybe that helps adding support for the itunes gapless metadata.

comment:5 by Balling, 2 years ago

Keywords: aac gapless added
Summary: gapless playback (probably) doesn't work with AACgapless playback doesn't work with AAC (remainder and Apple style)

comment:6 by Balling, 2 years ago

Some other samples test1_nero.m4a

FS#12185 : Fix gapless playback for Nero AAC
https://www.rockbox.org/tracker/task/12185

Last edited 2 years ago by Balling (previous) (diff)

comment:7 by Balling, 21 months ago

Status: newopen

AFAIU the website, decoders "not doing it right" would also add a "gap" at the beginning, but it seems this works properly with ffmpeg (but maybe I'm doing something wrong here).

We do it right, just only at the beggining (now that HE-AAC (and v2) are fixed, it was eating too much in the beginning).

Chromium fixed remainder here (first commit): https://bugs.chromium.org/p/chromium/issues/detail?id=668999

Of course it cannot be ported, since it uses ffmpeg's wrapper: https://chromium-review.googlesource.com/c/chromium/src/+/1114094/

Last edited 21 months ago by Balling (previous) (diff)

comment:9 by Balling, 14 months ago

Owner: set to Elon Musk

Can you fix that, Chrome has this correct, so insane visability of this code path!! See patch in previous comment that will also fix subtitle code for mpv (again, mpv has this code path correct, since no such bug with subtitles).

Why is this so slow and so many bugs are present...

Last edited 14 months ago by Balling (previous) (diff)

comment:10 by Christoph Anton Mitterer, 14 months ago

Please note it in the bug, once this is merged (and ideally with which version)... once that lands in Debian I could test it again.

Thanks,
Chris.

comment:11 by Christoph Anton Mitterer, 14 months ago

Oh btw: your patch, does for which gapless playback "notation" does it fix it? As far as I understood there are different formats for AAC (one following MPEG standards... and some proprietary Apply system - see comments above).

in reply to:  11 comment:12 by John Regan, 10 months ago

Replying to Christoph Anton Mitterer:

Oh btw: your patch, does for which gapless playback "notation" does it fix it? As far as I understood there are different formats for AAC (one following MPEG standards... and some proprietary Apply system - see comments above).

That listed patch is for the MPEG standard method - it relies on the stts box to find the last signaled sample duration. The Apple method relies on an MP4 tag that it parses.

I'm not sure if that patch would handle fragmented MP4 files - they usually have an empty stts box, and instead rely on the Track Fragment Header Box (tfhd) or the Track Fragment Run Box (trun).

As far as I know here's the available methods to signal gapless playback:

  1. Use an Edit List Box to list the duration and priming samples, and optional stts box.1
  2. Use an Edit List Box to only list the priming samples, and rely on either:
    • The trun box to list packet durations of the fragment.
    • The tfhd box to list a default packet duration.2
    • The stts box to list packet durations (maybe? see footnote).3
  1. Use an MP4 custom metadata item (the Apple method).

Bug #10458 was closed as a duplicate of this one. The process outlined above to demonstrate the issue is complex (lots of downloading and concatenating files, etc). Thought it might be helpful to share my example for detecting an error with gapless decoding - just encode a file using some known number of samples, decode it back, and check that the number of decoded samples is different.

% ffmpeg -f lavfi -i anullsrc=r=48000:d=2 source.wav

# verify the created audio file as exactly 96000 samples
% soxi -s source.wav
96000

# encode to aac
% ffmpeg -i source.wav -c:a aac encoded.m4a

# (verify that encoded.m4a has an Edit List Box and STTS box using a tool like boxdumper)

# decode back to wav
% ffmpeg -i encoded.m4a destination.wav

# observe the sample count != 96000
% soxi -s destination.wav
96256

You can replace the aac codec with any codec that can go in MP4, and relies on the MP4 file to signal the duration - so aac, mp3, opus all have the issue.

Attaching sample files:

  • one that can rely on either the elst or the stts box (nonfragmented-aac-stts.mp4)
  • one that relies on the tfhd box (fragmented-aac-tfhd.mp4)
  • one that relies on the trun box (fragmented-aac-trun.mp4)
  • one that relies on the iTunes-style comment (nonfragmented-aac-itunes.mp4)

When properly decoded they should all be exactly 2 seconds @ 48kHz (96000 samples).


1: Even if the edit list has duration, you may need the stts box as well, say if you concatenated multiple streams into a single mp4 file - smooth playback would need multiple Edit List Boxes, and you may have packets mid-stream that need to be truncated.

2: Usually seen when the final fragment contains a single packet.

3: I'm not sure if having an Edit List Box with a duration of zero is valid if you can produce an stts box. If you know all the packet lengths to create an stts box, can fill in the total duration (minus padding) into the Edit List Box.

Last edited 10 months ago by John Regan (previous) (diff)

by John Regan, 10 months ago

Attachment: nonfragmented-aac-stts.mp4 added

Non-fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, and an stts box signaling the last packet has 768 samples. Should decode to exactly 96000 samples.

by John Regan, 10 months ago

Attachment: fragmented-aac-tfhd.mp4 added

Fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, the final fragment uses the tfhd box to signal the last packet has 768 samples. Should decode to exactly 96000 samples.

by John Regan, 10 months ago

Attachment: fragmented-aac-trun.mp4 added

Fragmented MP4 file with AAC audio. Contains an edit list box signaling priming samples to discard, the final fragment uses the trun box to signal the last packet has 768 samples. Should decode to exactly 96000 samples.

by John Regan, 10 months ago

Non-fragmented MP4 file with AAC audio. Uses a custom metadata (iTunSMPB) to signal the priming and padding samples to discard. Should decode to exactly 96000 samples.

Note: See TracTickets for help on using tickets.