Opened 14 months ago

Last modified 12 months ago

#10477 new defect

WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields

Reported by: Maximilian Mumme Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: AAC libfdk-aac apple
Cc: Maximilian Mumme Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

This only reproduces on Apple platforms (macOS, iOS).

When playing AAC-HE files encoded with FFmpeg with an audio player that uses CoreAudio as its backend (e.g. QuickTime Player, QuickLook, AULab) we noticed the first few frames are being cut off and not audible in playback.
Assuming this was a bug in CoreAudio we reported an issue to Apple Developer Technical Support. However, they were able to track it down to a bug in FFmpeg.

Here are the steps to reproduce our findings:

First, install FFmpeg with AAC support from homebrew:

% brew tap homebrew-ffmpeg/ffmpeg
% brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac

In our case this installed

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-nonfree

The attached file click_240bpm.wav can be used as a sample file to reproduce our findings. It contains a "high-low-low-low" click pattern where the very first "high" click starts on the 0th frame of the file.

We converted this file to AAC-HE using FFmpeg with the following command (output see [0]):

ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a aac_he click_240bpm-ffmpeg.m4a

As a comparison, we can also convert the file to AAC-HE with Apple's afconvert tool, which uses CoreAudio as its backend:

afconvert -d aach click_240bpm.wav click_240bpm-afconvert.m4a

Comparing these two files in a listening test with QuickTime Player we noticed that the afconvert file plays back fine while for the ffmpeg file the first "high" click is cut off so that the click pattern starts with "low-low-low".

This can also be visualized by decoding the file to WAV again with afconvert and then visualizing the waveform in e.g. ocenaudio (screenshots attached):

afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffmpeg-dec.wav
afconvert -d LEI16 click_240bpm-afconvert.m4a click_240bpm-afconvert-dec.wav

The Apple engineers then pointed us to the following reason for this behaviour (quote):

After looking into the M4A further, we figured out the root cause of the problem. According to afinfo tool, the M4A file has 2529 samples leading zeros and 3 samples trailing zeros.

% afinfo click_240bpm-ffmpeg.m4a
[...]
audio 1014300 valid frames + 2529 priming + 3 remainder = 1016832
[...]

Since these numbers are based on 22.05 kHz sample rate of the AAC base layer codec, the actual decoder output should have 5058(=2529*2) samples leading zeros @ 44.1kHz sample rate. AudioCodecs has codec delay which is a roundtrip delay from the encoder to the decoder. The leading zero is corresponding to the codec delay. The decoder should skip this amount of leading zeros samples to align with the encoder input.

When we tried to decode the M4A file with ffmpeg tool, we realized that ffmpeg tool skips just only 4096 samples ignoring “2529 priming” information in the M4A file, and its output is aligned with the orignal WAV file. ffmpeg tool should have put “2048 priming / 484 remainder” to the M4A file. CoreAudio skipped 5058 samples according to the priming information in the M4A, instead of 4096 samples, and it missed the first note as you described. We think this is a bug of ffmpeg tool.

If you force the priming information to be 2048 leading zeros and 484 trailing zeros with the following command, you would see the expected output.

% afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffpmeg-dec.wav --prime-override 2048 484

We believe that this is also the root cause for the issues #2325 and #5910.

---
[0] FFmpeg command output:

% ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a aac_he click_240bpm-ffmpeg.m4a                                                                              ✔  miniconda3   12:44:55
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-nonfree
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'click_240bpm.wav':
  Metadata:
    encoded_by      : Logic Pro X
    date            : 2023-06-2
    creation_time   : 11:40:2
    time_reference  : 158848200
    umid            : 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
    coding_history  :
  Duration: 00:00:46.00, bitrate: 2119 kb/s
  Chapters:
    Chapter #0:0: start 0.000000, end 46.000000
      Metadata:
        title           : Tempo: 240.0
  Stream #0:0: Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s32 (24 bit), 2116 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s24le (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
Output #0, ipod, to 'click_240bpm-ffmpeg.m4a':
  Metadata:
    encoded_by      : Logic Pro X
    date            : 2023-06-2
    coding_history  :
    time_reference  : 158848200
    umid            : 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
    encoder         : Lavf60.3.100
  Chapters:
    Chapter #0:0: start 0.000000, end 46.000000
      Metadata:
        title           : Tempo: 240.0
  Stream #0:0: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 44100 Hz, stereo, s16, 64 kb/s
    Metadata:
      encoder         : Lavc60.3.100 libfdk_aac
size=     366kB time=00:00:45.95 bitrate=  65.3kbits/s speed= 115x
video:0kB audio:361kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.463942%

Attachments (5)

click_240bpm-ffmpeg.m4a (366.4 KB ) - added by Maximilian Mumme 14 months ago.
click_240bpm-afconvert.m4a (121.5 KB ) - added by Maximilian Mumme 14 months ago.
click_240bpm.wav (1.2 MB ) - added by Maximilian Mumme 14 months ago.
The original WAV file (had to shorten this file to fit the file size limit).
Screenshot of click_240bpm-afconvert-dec.wav in ocenaudio.jpg (1.1 MB ) - added by Maximilian Mumme 14 months ago.
Screenshot of click_240bpm-ffmpeg-dec.wav in ocenaudio.jpg (1.0 MB ) - added by Maximilian Mumme 14 months ago.

Change History (8)

by Maximilian Mumme, 14 months ago

Attachment: click_240bpm-ffmpeg.m4a added

by Maximilian Mumme, 14 months ago

Attachment: click_240bpm-afconvert.m4a added

by Maximilian Mumme, 14 months ago

Attachment: click_240bpm.wav added

The original WAV file (had to shorten this file to fit the file size limit).

comment:1 by Balling, 12 months ago

https://patchwork.ffmpeg.org/project/ffmpeg/patch/Ne70gnX--3-9@lynne.ee/

but again, we have iTunSMPB here that says how many samples are there: 00000000000F7A1C that means 1 014 300 samples (x2 gets to 2028600). Also, 00000840 samples of priming (2112 in decimal) and 000005A4 (1444) of remainder. So the file has alltogether 1 017 856 samples, but priming and remainder must be removed.

Extract to adts first to see it.

Last edited 12 months ago by Balling (previous) (diff)

comment:2 by Balling, 12 months ago

Can you give original click_240bpm.wav​? How was it generated?

comment:3 by Balling, 12 months ago

You are wrong. The priming on click_240bpm-afconvert.m4a is 5186 == (2112 + 481)*2 samples, if you do first remux to adts you will see. Also qaac decodes it correctly: qaac64.exe -D click_240bpm-afconvert.m4a

Alltogether 2028600 samples.

ffmpeg version 6.0 Copyright

We do not support ffmpeg version 6.0 Copyright, only git-master

Last edited 12 months ago by Balling (previous) (diff)
Note: See TracTickets for help on using tickets.