Opened 6 months ago

Last modified 7 weeks ago

#11018 new defect

Bug: Export AAC from video file with specific duration don't give the good duration

Reported by: Imprevisible Owned by:
Priority: normal Component: ffmpeg
Version: unspecified Keywords: segment aac seek seeking
Cc: Imprevisible, MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
I want to extract specific seconds of a video file, to an AAC, to create custom HLS file. When I export 5s, the outputed file is never 5s, maybe less, maybe more, but never 5s/
How to reproduce:

% ffmpeg -hide_banner -loglevel error -i "/media/Dockarr/downloads-vpn/media/Séries/The IT Crowd/Season 1/The It Crowd S01e01 - Yesterday's Jam.m4v" -ss 0:00:00 -to 0:00:05 -c:a aac -map 0:a:0 -ac 2 -vn -report test.aac
ffmpeg version 4.4.2-0ubuntu0.22.04.1
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared

Attachments (3)

ffmpeg-20240516-201050.log (8.3 KB ) - added by Imprevisible 6 months ago.
ffmpeg log
test.aac (81.3 KB ) - added by Imprevisible 6 months ago.
Outputed aac file
in.aac (304.3 KB ) - added by Imprevisible 6 months ago.
First 12.82s of a video, as an aac

Download all attachments as: .zip

Change History (34)

by Imprevisible, 6 months ago

Attachment: ffmpeg-20240516-201050.log added

ffmpeg log

by Imprevisible, 6 months ago

Attachment: test.aac added

Outputed aac file

comment:1 by Imprevisible, 6 months ago

So, for information I tried with -to and -t to set the duration/the end time. I tried accurate_seeking and not, so -ss as input and output parameter, and it happen for all my video files, every video file, every codec etc... maybe it's my ffmpeg version ?

comment:2 by MasterQuestionable, 6 months ago

͏    Lossy audio codecs tend to impose limit on the number of samples:
͏    Which must be a multiple of the Samples Per Frame.
͏    For AAC it's constant 1,024 SPF. [ Mostly. https://gitlab.com/mbunkus/mkvtoolnix/-/issues/2031 ]
͏    See also:
͏    https://stackoverflow.com/questions/59173435#59332275
͏    https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-260001.3.2
͏    https://ffmpeg.org/ffmpeg-codecs.html#libopus-1
͏    https://opus-codec.org/docs/html_api/group__opusencoder.html#ga88621a963b809ebfc27887f13518c966

͏    Also you might be doing pointless re-encoding with your command: ͏"-c copy" preferable.
͏    (and seemingly confused ͏"-map" directive)

Last edited 7 weeks ago by MasterQuestionable (previous) (diff)

comment:3 by MasterQuestionable, 6 months ago

Cc: MasterQuestionable added

comment:4 by Balling, 6 months ago

It is impossible to get to perfect 5 seconds unless you use mp4 container and editlist in mp4, that says what is the size of intial encoder delay and what is the size of the remainder in the end of audio -- even then ffmpeg does not support applying remainder, so typically you cannot get 5 seconds with aac anyway. See #10477 and many others.

comment:5 by Imprevisible, 6 months ago

Isn't that possible that I use any alternative way ? I can extract precise video segments, maybe I can cut the video and extract the audio from this presegmented file? Also I use aac cause it's to make HLS streaming, maybe another audio codec is compatible with AAC and is precise enough?

comment:6 by Imprevisible, 6 months ago

Based on that:https://datatracker.ietf.org/doc/html/draft-pantos-http-live-streaming-20#section-3.4(https://datatracker.ietf.org/doc/html/draft-pantos-http-live-streaming-20#section-3.4)
I can't use a lot of codecs:

Supported Packed Audio formats are AAC with ADTS
   framing [ISO_13818_7]; MP3 [ISO_13818_3]; AC-3 [AC_3]; and Enhanced
   AC-3 [AC_3].

As I can see on the apple website, FLAC is also possible. But I don't see the SPF of each one of them. For information, this ffmpeg command is generated with a python script, to select the right audio index, the right parameters.

Version 0, edited 6 months ago by Imprevisible (next)

comment:7 by MasterQuestionable, 6 months ago

͏    For reference:
͏    1,024 * 1/48,000 ≈ 0.021333 s
͏    .
͏    Such difference mostly doesn't matter.

͏    The limitation is format binded: incorrigible.


͏    SPF may be variable, as previously indicated.
͏    And probably doesn't matter for FLAC: as it's lossless.

comment:8 by Balling, 6 months ago

1024 is for our ffmpeg encoder. Apple encoder does not use 1024 samples for priming. It uses 2112. HE uses even more samples for priming.

Remainder is not related to this once again.

And probably doesn't matter for FLAC: as it's lossless

Alas, I only saw like one anime using it. Original uses Dolby TrueHD, anyway. And BTW, if you can use TrueHD, do so. Our decoder is finally lossless in all 24 bits mandated by TrueHD.

Last edited 6 months ago by Balling (previous) (diff)

comment:9 by MasterQuestionable, 6 months ago

͏    1,024 refers the SPF.

͏    I prefer FLAC and don't quite understand the necessity of things like Dolby TrueHD.
͏    And recently started to consider lossy and lossless codecs essentially the same thing. [ See <denoise>. ]

comment:10 by Balling, 6 months ago

I know what samples per frame are. Apple uses 1024 + 1024 + 64

Last edited 6 months ago by Balling (previous) (diff)

comment:12 by Balling, 6 months ago

I do not think you need to use google. I described this issue ad nauseum: #7828.

comment:13 by Balling, 6 months ago

MasterQuestionable, can you help review https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220128052107.1678032-1-kode54@gmail.com/

Without this fix aac fix is kinda useless. Since previous gen audio was not fixed either.

comment:14 by MasterQuestionable, 6 months ago

͏    Many info I post there is much for self reference. (merely publicized in addition, believing which helpful)
͏    Hardly specifically targeting anybody.


͏    I shall review later.
͏    Related: https://trac.ffmpeg.org/ticket/2325#comment:20

͏    I also noted some anomaly (quality-wise) with FFmpeg's AAC implementation: regardless I don't much use.
͏    (and probably hardly sensitive enough to really notice...)

comment:15 by Imprevisible, 6 months ago

I did some tests, and I can see that only the aac and the mp3 dont have a proper seeking. I'm doing HLS streaming, and I need to generate precise audio files so I don't get the audio out of sync. I'm gonna try to play all this audio files instead, but I'm not sure it will be possible

AAC:
command:
ffmpeg -y -ss 00:00:00 -to 00:00:05 -i "../Téléchargements/The It Crowd S01e01 - Yesterday's Jam.m4v" -vn test.aac

mediainfo:
General
Complete name                            : ../ffmpeg_test/test.aac
Format                                   : ADTS
Format/Info                              : Audio Data Transport Stream
File size                                : 83.6 KiB
Overall bit rate mode                    : Variable

Audio
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Format version                           : Version 4
Codec ID                                 : 2
Bit rate mode                            : Variable
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 83.6 KiB (100%)

MP3:
command:
ffmpeg -y -ss 00:00:00 -to 00:00:05 -i "../Téléchargements/The It Crowd S01e01 - Yesterday's Jam.m4v" -vn test.mp3

mediainfo:
General
Complete name                            : ../ffmpeg_test/test.mp3
Format                                   : MPEG Audio
File size                                : 79.1 KiB
Duration                                 : 5 s 40 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 128 kb/s
Writing library                          : Lavf61.3.103
major_brand                              : mp42
minor_version                            : 512
compatible_brands                        : isomiso2mp41

Audio
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 3
Format settings                          : Joint stereo
Duration                                 : 5 s 40 ms
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Frame rate                               : 41.667 FPS (1152 SPF)
Compression mode                         : Lossy
Stream size                              : 78.8 KiB (100%)

OGG:
command:
ffmpeg -y -ss 00:00:00 -to 00:00:05 -i "../Téléchargements/The It Crowd S01e01 - Yesterday's Jam.m4v" -vn test.ogg

mediainfo:
General
Complete name                            : ../ffmpeg_test/test.ogg
Format                                   : Ogg
File size                                : 62.3 KiB
Duration                                 : 5 s 0 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 102 kb/s
Writing application                      : Lavc61.5.104 libvorbis
creation_time                            : 2020-12-08T23:31:48.000000Z
handler_name                             : Stereo
vendor_id                                : [0][0][0][0]
major_brand                              : mp42
minor_version                            : 512
compatible_brands                        : isomiso2mp41

Audio
ID                                       : 3624949711 (0xD81057CF)
Format                                   : Vorbis
Format settings, Floor                   : 1
Duration                                 : 5 s 0 ms
Bit rate mode                            : Variable
Bit rate                                 : 112 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Compression mode                         : Lossy
Stream size                              : 68.4 KiB
Writing library                          : Lavf61.3.103
Language                                 : English

WAV:
command:
ffmpeg -y -ss 00:00:00 -to 00:00:05 -i "../Téléchargements/The It Crowd S01e01 - Yesterday's Jam.m4v" -vn test.wav

mediainfo:
General
Complete name                            : ../ffmpeg_test/test.wav
Format                                   : Wave
Format settings                          : PcmWaveformat
File size                                : 938 KiB
Duration                                 : 5 s 0 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 536 kb/s
Writing application                      : Lavf61.3.103

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 5 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Bit depth                                : 16 bits
Stream size                              : 938 KiB (100%)

FLAC:
command:
ffmpeg -y -ss 00:00:00 -to 00:00:05 -i "../Téléchargements/The It Crowd S01e01 - Yesterday's Jam.m4v" -vn test.flac

mediainfo:
General
Complete name                            : ../ffmpeg_test/test.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
File size                                : 916 KiB
Duration                                 : 5 s 0 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 1 501 kb/s
Writing application                      : Lavf61.3.103
major_brand                              : mp42
minor_version                            : 512
compatible_brands                        : isomiso2mp41

Audio
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Duration                                 : 5 s 0 ms
Bit rate mode                            : Variable
Bit rate                                 : 1 487 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Bit depth                                : 24 bits
Compression mode                         : Lossless
Stream size                              : 908 KiB (99%)
Writing library                          : Lavf61.3.103
MD5 of the unencoded content             : 44ADFEF5FD32EBC45E5E9B71D8637F2B
Last edited 6 months ago by Imprevisible (previous) (diff)

comment:16 by MasterQuestionable, 6 months ago

͏    Such segments shall be rejoined client-side.
͏    Try just slice using "-c copy" and let the downstream handle.

comment:17 by Imprevisible, 6 months ago

I tried things, and HLS.js won't stream the audio, if I don't force the format to adts... So I guess I'm fucked

comment:18 by MasterQuestionable, 6 months ago

͏    Similar library JS typically won't interfere the browser's media handling.
͏    The format support is usually decided by the browser.

͏    I believe MP3, Vorbis, FLAC, Opus shall be all supported by mainstream browsers.
͏    Probably you just have to put them into the appropriate container for the context:
͏    WebM: Vorbis, Opus
͏    MP4: MP3, FLAC
͏    .
͏    More info: https://github.com/richtr/NoSleep.js/issues/157#issuecomment-1529149759

͏    Regardless, first transcoding to (if not already in) AAC then slice with "-c copy" should work.


͏    Worth notice:
͏    The various media streams contained don't have to be of equal duration: in particular for HLS alike.
͏    .
͏    Though having the primary streams of unequal duration (after joining) tends to cause unspecified behavior.

Last edited 6 months ago by MasterQuestionable (previous) (diff)

in reply to:  18 comment:19 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    Worth notice:
͏    The various media streams contained don't have to be of equal duration: in particular for HLS alike.
͏    .
͏    Though having the primary streams of unequal duration (after joining) tends to cause unspecified behavior.

Sadly I have to, I'm generating dynamically with Python all the segments, so when I generate the segment 2, I don't know the duration of the segment 1. So if the segment is 4.88s, in the first segment, the second one will still start a 5s, so there's a 0.12s gap, for only two segment, but a segment is 5s, and a video file can be to like 3hours, so it's hundreds of segments

comment:20 by MasterQuestionable, 6 months ago

͏    The implication is minor duration variation between different ~ 5 s slices mostly doesn't matter.
͏    Not that you shall deliberately make such uneven streams without careful consideration.

in reply to:  20 comment:21 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    The implication is minor duration variation between different ~ 5 s slices mostly doesn't matter.
͏    Not that you shall deliberately make such uneven streams without careful consideration.

As I said, a gap of 0.12s for only two segments is okay, but for 1440 one, next to each other it's not the same, it goes to a 172.8s gap in all the duration, because a .ts file can be exactly 5s long

comment:22 by MasterQuestionable, 6 months ago

͏    The misalignment of seeking by timestamp tends to vary with different timestamps.
͏    And for consecutive ones: tends to self-recover.

͏    Also, per aforementioned (web streaming) sticking to M2TS isn't necessary.
͏    And the 5 s assertion doesn't have much solid base.

in reply to:  22 comment:23 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    The misalignment of seeking by timestamp tends to vary with different timestamps.
͏    And for consecutive ones: tends to self-recover.

͏    Also, per aforementioned (web streaming) sticking to M2TS isn't necessary.
͏    And the 5 s assertion doesn't have much solid base.

Sadly, what your saying is wrong, cause all of the .aac file I generated are less that 5s, also there's still, either audio that missing, or duplicating audio if the duration is more than 5s (but that's never the case), so no, it's no self-recovering

comment:24 by MasterQuestionable, 6 months ago

͏    So "-t 5 -ss $( 0, 5, 10, ... )" alike always result in < 5 s duration?
͏    "-c copy" here won't do re-encoding: merely slicing the packets/frames independently representable.

͏    You might have to "-map" only the interested audio to make things work properly.
͏    (other streams may interfere the slicing)

Last edited 6 months ago by MasterQuestionable (previous) (diff)

in reply to:  24 comment:25 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    So "-t 5 -ss $( 0, 5, 10, ... )" alike always result in < 5 s duration?
͏    "-c copy" here won't do re-encoding: merely slicing the packets/frames independently representable.

͏    You might have to "-map" only the interested audio to make things work properly.
͏    (other streams may interfere the slicing)

I map the right audio in other commands, for the readability here it's not shown, but I map the audio, and I even set the number of audio channels. I tried to -c copy, same issue, and yes, if I set -t to 5, it's always less than 5s for aac, for mp3 its always more, and for flac, ogg and wav, it's exactly 5s

comment:26 by MasterQuestionable, 6 months ago

͏    Last resort:
͏    "-c copy" the entire audio stream into 1 independent file.
͏    Try the "-ss" "-t" etc. as either input/output options. [ See also: <Seeking> ]

Last edited 6 months ago by MasterQuestionable (previous) (diff)

in reply to:  26 comment:27 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    Last resort:
͏    "-c copy" the entire audio stream into 1 independent file.
͏    Try the "-ss" "-t" etc. as either input/output options.

As I said, same issue, the segment is less that 5s...

comment:28 by MasterQuestionable, 6 months ago

͏    Likely there are issues with the seeking mechanism then.

͏    I may do further testing later.
͏    Would you upload a minimal sample AAC alleged?

in reply to:  28 comment:29 by Imprevisible, 6 months ago

Replying to MasterQuestionable:

͏    Likely there are issues with the seeking mechanism then.

͏    I may do further testing later.
͏    Would you upload a minimal sample AAC alleged?

Do you want the first 1min of the audio in aac ?

comment:30 by MasterQuestionable, 6 months ago

͏    Preferably, that just enough to reproduce the problem. (10 or 15 s?)
͏    Upload to the ticket's attachment. (as "in.aac")

by Imprevisible, 6 months ago

Attachment: in.aac added

First 12.82s of a video, as an aac

comment:31 by MasterQuestionable, 6 months ago

͏    https://trac.ffmpeg.org/raw-attachment/ticket/11018/in.aac
͏    (~ 304.3 KiB; AAC: 15.018 s, 48,000 Hz)

Note: See TracTickets for help on using tickets.