Opened 14 months ago
Last modified 12 months ago
#10477 new defect
WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields
Reported by: | Maximilian Mumme | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | git-master | Keywords: | AAC libfdk-aac apple |
Cc: | Maximilian Mumme | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
This only reproduces on Apple platforms (macOS, iOS).
When playing AAC-HE files encoded with FFmpeg with an audio player that uses CoreAudio as its backend (e.g. QuickTime Player, QuickLook, AULab) we noticed the first few frames are being cut off and not audible in playback.
Assuming this was a bug in CoreAudio we reported an issue to Apple Developer Technical Support. However, they were able to track it down to a bug in FFmpeg.
Here are the steps to reproduce our findings:
First, install FFmpeg with AAC support from homebrew:
% brew tap homebrew-ffmpeg/ffmpeg % brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac
In our case this installed
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers built with Apple clang version 14.0.0 (clang-1400.0.29.202) configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-nonfree
The attached file click_240bpm.wav
can be used as a sample file to reproduce our findings. It contains a "high-low-low-low" click pattern where the very first "high" click starts on the 0th frame of the file.
We converted this file to AAC-HE using FFmpeg with the following command (output see [0]):
ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a aac_he click_240bpm-ffmpeg.m4a
As a comparison, we can also convert the file to AAC-HE with Apple's afconvert
tool, which uses CoreAudio as its backend:
afconvert -d aach click_240bpm.wav click_240bpm-afconvert.m4a
Comparing these two files in a listening test with QuickTime Player we noticed that the afconvert
file plays back fine while for the ffmpeg
file the first "high" click is cut off so that the click pattern starts with "low-low-low".
This can also be visualized by decoding the file to WAV again with afconvert
and then visualizing the waveform in e.g. ocenaudio (screenshots attached):
afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffmpeg-dec.wav afconvert -d LEI16 click_240bpm-afconvert.m4a click_240bpm-afconvert-dec.wav
The Apple engineers then pointed us to the following reason for this behaviour (quote):
After looking into the M4A further, we figured out the root cause of the problem. According to
afinfo
tool, the M4A file has 2529 samples leading zeros and 3 samples trailing zeros.
% afinfo click_240bpm-ffmpeg.m4a [...] audio 1014300 valid frames + 2529 priming + 3 remainder = 1016832 [...]Since these numbers are based on 22.05 kHz sample rate of the AAC base layer codec, the actual decoder output should have 5058(=2529*2) samples leading zeros @ 44.1kHz sample rate. AudioCodecs has codec delay which is a roundtrip delay from the encoder to the decoder. The leading zero is corresponding to the codec delay. The decoder should skip this amount of leading zeros samples to align with the encoder input.
When we tried to decode the M4A file with ffmpeg tool, we realized that ffmpeg tool skips just only 4096 samples ignoring “2529 priming” information in the M4A file, and its output is aligned with the orignal WAV file. ffmpeg tool should have put “2048 priming / 484 remainder” to the M4A file. CoreAudio skipped 5058 samples according to the priming information in the M4A, instead of 4096 samples, and it missed the first note as you described. We think this is a bug of ffmpeg tool.
If you force the priming information to be 2048 leading zeros and 484 trailing zeros with the following command, you would see the expected output.
% afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffpmeg-dec.wav --prime-override 2048 484
We believe that this is also the root cause for the issues #2325 and #5910.
---
[0] FFmpeg command output:
% ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a aac_he click_240bpm-ffmpeg.m4a ✔ miniconda3 12:44:55 ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers built with Apple clang version 14.0.0 (clang-1400.0.29.202) configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-htmlpages --enable-libfdk-aac --enable-nonfree libavutil 58. 2.100 / 58. 2.100 libavcodec 60. 3.100 / 60. 3.100 libavformat 60. 3.100 / 60. 3.100 libavdevice 60. 1.100 / 60. 1.100 libavfilter 9. 3.100 / 9. 3.100 libswscale 7. 1.100 / 7. 1.100 libswresample 4. 10.100 / 4. 10.100 libpostproc 57. 1.100 / 57. 1.100 Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, wav, from 'click_240bpm.wav': Metadata: encoded_by : Logic Pro X date : 2023-06-2 creation_time : 11:40:2 time_reference : 158848200 umid : 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000 coding_history : Duration: 00:00:46.00, bitrate: 2119 kb/s Chapters: Chapter #0:0: start 0.000000, end 46.000000 Metadata: title : Tempo: 240.0 Stream #0:0: Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s32 (24 bit), 2116 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s24le (native) -> aac (libfdk_aac)) Press [q] to stop, [?] for help Output #0, ipod, to 'click_240bpm-ffmpeg.m4a': Metadata: encoded_by : Logic Pro X date : 2023-06-2 coding_history : time_reference : 158848200 umid : 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000 encoder : Lavf60.3.100 Chapters: Chapter #0:0: start 0.000000, end 46.000000 Metadata: title : Tempo: 240.0 Stream #0:0: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 44100 Hz, stereo, s16, 64 kb/s Metadata: encoder : Lavc60.3.100 libfdk_aac size= 366kB time=00:00:45.95 bitrate= 65.3kbits/s speed= 115x video:0kB audio:361kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.463942%
Attachments (5)
Change History (8)
by , 14 months ago
Attachment: | click_240bpm-ffmpeg.m4a added |
---|
by , 14 months ago
Attachment: | click_240bpm-afconvert.m4a added |
---|
by , 14 months ago
Attachment: | click_240bpm.wav added |
---|
by , 14 months ago
Attachment: | Screenshot of click_240bpm-afconvert-dec.wav in ocenaudio.jpg added |
---|
by , 14 months ago
Attachment: | Screenshot of click_240bpm-ffmpeg-dec.wav in ocenaudio.jpg added |
---|
comment:1 by , 12 months ago
https://patchwork.ffmpeg.org/project/ffmpeg/patch/Ne70gnX--3-9@lynne.ee/
but again, we have iTunSMPB here that says how many samples are there: 00000000000F7A1C that means 1 014 300 samples (x2 gets to 2028600). Also, 00000840 samples of priming (2112 in decimal) and 000005A4 (1444) of remainder. So the file has alltogether 1 017 856 samples, but priming and remainder must be removed.
Extract to adts first to see it.
comment:3 by , 12 months ago
You are wrong. The priming on click_240bpm-afconvert.m4a is 5186 == (2112 + 481)*2 samples, if you do first remux to adts you will see. Also qaac decodes it correctly: qaac64.exe -D click_240bpm-afconvert.m4a
Alltogether 2028600 samples.
ffmpeg version 6.0 Copyright
We do not support ffmpeg version 6.0 Copyright, only git-master
The original WAV file (had to shorten this file to fit the file size limit).