#10458 closed defect (duplicate)

MPEG4 demuxing: last audio sample's duration ignored

Reported by: John Regan Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords:
Cc: John Regan Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by John Regan)

It seems like ffmpeg is properly removing the front padding from audio in mp4 files, but doesn't account for the end padding added to audio frames to round up to the frame length.

This is signaled by the mp4 file listing a different duration for the final sample - either via the Decoding Time to Sample Box box, or for fragmented mp4s, the sample duration in the track fragment run box.

How to reproduce:

% ffmpeg -f lavfi -i anullsrc=r=48000:d=2 source.wav

# verify the created audio file as exactly 96000 samples
% soxi -s source.wav
96000

# encode to aac
% ffmpeg -i source.wav -c:a aac encoded.m4a

# decode back to wav
% ffmpeg -i encoded.m4a destination.wav

# observe the sample count != 96000
% soxi -s destination.wav
96256

Using boxdumper from l-smash, I can verify that ffmpeg correctly added an edit list box that lists total media duration, as well as the samples to trim from the beginning of the audio (the encoder delay):

[edts: Edit Box]
    position = 845
    size = 36   
    [elst: Edit List Box]
        position = 853
        size = 28   
        version = 0 
        flags = 0x000000
        entry_count = 1
        entry[0]    
            segment_duration = 2000
            media_time = 1024
            media_rate = 1.000000

The Decoding Time to Sample Box specifies the final sample is 768 frames. Doing the math: (94 samples * 1024 frames) + 768 = 97024 frames. Subtract the 1024 frames from the previous Edit List Box and you should have 96000 samples.

[stts: Decoding Time to Sample Box]
    position = 1140
    size = 32
    version = 0
    flags = 0x000000
    entry_count = 2
    entry[0]
        sample_count = 94
        sample_delta = 1024     
    entry[1]
        sample_count = 1
        sample_delta = 768

I think the issue may be the MP4 demuxer not signaling the final decoded packet's duration. This occurs if I use other codecs as well, for example mp3:

# using the same source.wav as above that's 96000 samples:
% ffmpeg -i source.wav -c:a libmp3lame encoded-in-mp3.mp4
% ffmpeg -i encoded-in-mp3.mp4 decoded-from-mp3.wav
% soxi -s decoded-from-mp3.wav 
96815

Here's the edts box and stts from encoded-in-mp3.mp4:

[edts: Edit Box]
    position = 32900
    size = 36
    [elst: Edit List Box]
        position = 32908
        size = 28
        version = 0
        flags = 0x000000
        entry_count = 1
        entry[0]
            segment_duration = 2000
            media_time = 1105
            media_rate = 1.000000

[stts: Decoding Time to Sample Box]
    position = 33205
    size = 32
    version = 0
    flags = 0x000000
    entry_count = 2
    entry[0]
        sample_count = 84
        sample_delta = 1152
    entry[1]
        sample_count = 1
        sample_delta = 337

So again doing some math: (84 samples * 1152 frames) + 337 frames = 97105 frames. Subtract the 1105 frames from the edit list - 96000 frames.

Another example with opus:

% ffmpeg -i source.wav -c:a libopus encoded-in-opus.mp4
% ffmpeg -i encoded-in-opus.mp4 decoded-from-opus.wav
% soxi -s decoded-from-opus.wav
96648

Same issue with a fragmented mp4 - which doesn't have the Decoding Time to Sample Box and instead relies on either the Track Fragment Header Box or the Track Fragment Run Box for sample duration signaling.

This does not seem to apply to codecs that carry their own duration signaling, like FLAC in mp4.

ffmpeg version info:

ffmpeg version n6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13.1.1 (GCC) 20230429
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libjxl --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100

Attachments (2)

fragmented-aac.mp4 (1.9 KB ) - added by John Regan 10 months ago.
Example fragmented mp4 file with a final sample duration of 768 frames. The final sample is in its own fragment, the duration is signaled in the Track Fragment Header Box as the default sample duration
fragmented-aac-trun.mp4 (2.5 KB ) - added by John Regan 10 months ago.
Example fragmented mp4 file with a final sample duration of 768 frames. The final sample duration is signaled in the Track Fragment Run Box.

Download all attachments as: .zip

Change History (6)

comment:1 by John Regan, 10 months ago

Cc: John Regan added

comment:2 by John Regan, 10 months ago

Description: modified (diff)
Summary: MPEG4 AAC decoding: end padding not trimmedMPEG4 demuxing: last sample's duration ignored

Discovered this isn't limited to just AAC - I think it may apply to any codec that relies on the mp4 file to signal the last sample's duration (tested with the native aac encoder, libmp3lame, and libopus). I've updated the bug title and description accordingly.

by John Regan, 10 months ago

Attachment: fragmented-aac.mp4 added

Example fragmented mp4 file with a final sample duration of 768 frames. The final sample is in its own fragment, the duration is signaled in the Track Fragment Header Box as the default sample duration

by John Regan, 10 months ago

Attachment: fragmented-aac-trun.mp4 added

Example fragmented mp4 file with a final sample duration of 768 frames. The final sample duration is signaled in the Track Fragment Run Box.

comment:3 by John Regan, 10 months ago

Summary: MPEG4 demuxing: last sample's duration ignoredMPEG4 demuxing: last audio sample's duration ignored

comment:4 by Balling, 10 months ago

Resolution: duplicate
Status: newclosed

Known bug #7828, fixed in Chromium.

Note: See TracTickets for help on using tickets.