AAC to PCM conversion inserts extra silence in the beginning
|Reported by:||jwilhelmsson||Owned by:|
|Blocking:||Reproduced by developer:||no|
|Analyzed by developer:||no|
Summary of the bug:
When converting AAC audio files/streams to PCM extra silence is inserted in the beginning of the output file.
This may very well be the same issue as ticket #2325, but since I believe I have more information I elected to create a new one.
The long version:
My company dub cartoons, so we receive many kinds of video formats from our various clients. Recently one of them complained that our final delivery was out of sync compared to the original material, and that's how this issue was discovered. The reference files from the client were mp4:s with aac audio, and when I converted said audio into wav files (for use in our recording software, Steinberg Nuendo) extra silence got inserted in the beginning, making us record everything out of sync.
After lots of testing I concluded that it's the AAC to PCM conversion that's the culprit (ie. the video container format is mostly irrelevant), and also that the length of the inserted silence varies between different files. I haven't been able to pinpoint exactly what causes the difference.
Attached are five aac files, plus wav files converted by ffmpeg (3.1.4) and QuickTime Pro (7.7.9) clearly showing the difference. Since the files come from commercial productions I've only included 7 to 10 seconds from each, but it's enough to see the error.
Two of the files insert approximately 44 milliseconds (or about 2100 samples) of silence, two insert 108 milliseconds (about 5200 samples), and one oddly enough gets only 32 milliseconds of silence even though the audio is shifted 44 ms (this is easy to see since it starts with a test tone).
How to reproduce:
The aac files were converted by ffmpeg with the command (I'll attach outputs in separate messages below):
ffmpeg -i input -c:a pcm_s24le -ar 48k output
They were also converted with QuickTime Pro with the same settings (24 bits, 48kHz). I then compared the waveforms in both Nuendo and Audacity. The offset values were measured by manually marking an area in Audacity, so they are very approximate.
The attached files come from one movie and two tv series (two episodes each). The movie files are called "g", and the series "tj" and "td". The aac files were extracted from the original mp4 files by stream copying:
ffmpeg -i input -c:a copy -t 10 output
The movie file starts with a test tone, and is also the one which differs 32 ms in the beginning of the test tone, but 44 at the end of it.
The error is the same when converting directly from the mp4 file and when converting from an extracted aac.
Extracting PCM wav from a mov container produces no errors.
If I convert the aac stream to a new aac file there's still an error, but only half as long. I've only tested this on one file, but it produced a 22 ms gap instead of 44 ms.
Compounded converting does not compound the error. Ie: Converting from aac to aac, and then converting that output file to aac again does not increase the error.
Converting to a different bitrate/sample rate does not affect the result.
I've done a lot of testing, but it's very possible that I've forgotten some vital information in this report, so please ask if you need more details.