Opened 18 months ago
Closed 18 months ago
#10458 closed defect (duplicate)
MPEG4 demuxing: last audio sample's duration ignored
Reported by: | John Regan | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | unspecified | Keywords: | |
Cc: | John Regan | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by )
It seems like ffmpeg is properly removing the front padding from audio in mp4 files, but doesn't account for the end padding added to audio frames to round up to the frame length.
This is signaled by the mp4 file listing a different duration for the final sample - either via the Decoding Time to Sample Box box, or for fragmented mp4s, the sample duration in the track fragment run box.
How to reproduce:
% ffmpeg -f lavfi -i anullsrc=r=48000:d=2 source.wav # verify the created audio file as exactly 96000 samples % soxi -s source.wav 96000 # encode to aac % ffmpeg -i source.wav -c:a aac encoded.m4a # decode back to wav % ffmpeg -i encoded.m4a destination.wav # observe the sample count != 96000 % soxi -s destination.wav 96256
Using boxdumper from l-smash, I can verify that ffmpeg correctly added an edit list box that lists total media duration, as well as the samples to trim from the beginning of the audio (the encoder delay):
[edts: Edit Box] position = 845 size = 36 [elst: Edit List Box] position = 853 size = 28 version = 0 flags = 0x000000 entry_count = 1 entry[0] segment_duration = 2000 media_time = 1024 media_rate = 1.000000
The Decoding Time to Sample Box specifies the final sample is 768 frames. Doing the math: (94 samples * 1024 frames) + 768 = 97024 frames. Subtract the 1024 frames from the previous Edit List Box and you should have 96000 samples.
[stts: Decoding Time to Sample Box] position = 1140 size = 32 version = 0 flags = 0x000000 entry_count = 2 entry[0] sample_count = 94 sample_delta = 1024 entry[1] sample_count = 1 sample_delta = 768
I think the issue may be the MP4 demuxer not signaling the final decoded packet's duration. This occurs if I use other codecs as well, for example mp3:
# using the same source.wav as above that's 96000 samples: % ffmpeg -i source.wav -c:a libmp3lame encoded-in-mp3.mp4 % ffmpeg -i encoded-in-mp3.mp4 decoded-from-mp3.wav % soxi -s decoded-from-mp3.wav 96815
Here's the edts box and stts from encoded-in-mp3.mp4:
[edts: Edit Box] position = 32900 size = 36 [elst: Edit List Box] position = 32908 size = 28 version = 0 flags = 0x000000 entry_count = 1 entry[0] segment_duration = 2000 media_time = 1105 media_rate = 1.000000 [stts: Decoding Time to Sample Box] position = 33205 size = 32 version = 0 flags = 0x000000 entry_count = 2 entry[0] sample_count = 84 sample_delta = 1152 entry[1] sample_count = 1 sample_delta = 337
So again doing some math: (84 samples * 1152 frames) + 337 frames = 97105 frames. Subtract the 1105 frames from the edit list - 96000 frames.
Another example with opus:
% ffmpeg -i source.wav -c:a libopus encoded-in-opus.mp4 % ffmpeg -i encoded-in-opus.mp4 decoded-from-opus.wav % soxi -s decoded-from-opus.wav 96648
Same issue with a fragmented mp4 - which doesn't have the Decoding Time to Sample Box and instead relies on either the Track Fragment Header Box or the Track Fragment Run Box for sample duration signaling.
This does not seem to apply to codecs that carry their own duration signaling, like FLAC in mp4.
ffmpeg version info:
ffmpeg version n6.0 Copyright (c) 2000-2023 the FFmpeg developers built with gcc 13.1.1 (GCC) 20230429 configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libjxl --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan libavutil 58. 2.100 / 58. 2.100 libavcodec 60. 3.100 / 60. 3.100 libavformat 60. 3.100 / 60. 3.100 libavdevice 60. 1.100 / 60. 1.100 libavfilter 9. 3.100 / 9. 3.100 libswscale 7. 1.100 / 7. 1.100 libswresample 4. 10.100 / 4. 10.100 libpostproc 57. 1.100 / 57. 1.100
Attachments (2)
Change History (6)
comment:1 by , 18 months ago
Cc: | added |
---|
comment:2 by , 18 months ago
Description: | modified (diff) |
---|---|
Summary: | MPEG4 AAC decoding: end padding not trimmed → MPEG4 demuxing: last sample's duration ignored |
by , 18 months ago
Attachment: | fragmented-aac.mp4 added |
---|
Example fragmented mp4 file with a final sample duration of 768 frames. The final sample is in its own fragment, the duration is signaled in the Track Fragment Header Box as the default sample duration
by , 18 months ago
Attachment: | fragmented-aac-trun.mp4 added |
---|
Example fragmented mp4 file with a final sample duration of 768 frames. The final sample duration is signaled in the Track Fragment Run Box.
comment:3 by , 18 months ago
Summary: | MPEG4 demuxing: last sample's duration ignored → MPEG4 demuxing: last audio sample's duration ignored |
---|
comment:4 by , 18 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Known bug #7828, fixed in Chromium.
Discovered this isn't limited to just AAC - I think it may apply to any codec that relies on the mp4 file to signal the last sample's duration (tested with the native aac encoder, libmp3lame, and libopus). I've updated the bug title and description accordingly.