#6343 closed defect (needs_more_info)
Encoding MP4/AAC audio from pcm: issues with packets, duration, and pts/dts (especially when using -movflags empty_moov)
Reported by: | ea167 | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | git-master | Keywords: | aac |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Hello,
I've encountered several issues trying to encode audio PCM into MP4/AAC.
I've recompiled the latest nightly to make sure it was not already solved.
Seem related to #2325, though not exactly the same.
Here are the FFmpeg command line I ran to encode a 8192 bytes of raw s16le PCM file (4096 samples) into MP4/AAC:
ffmpeg -nostdin -hide_banner -loglevel debug \ -f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \ -f mp4 -acodec aac -movflags empty_moov -ac 1 -ar 44100 -b:a 128000 \ result.mp4
(same without empty_moov)
ffmpeg -nostdin -hide_banner -loglevel debug \ -f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \ -f mp4 -acodec aac -ac 1 -ar 44100 -b:a 128000 \ result.mp4
1/ Why is there an empty packet added to the MP4?
When I run ffmpeg, I get the following logs:
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 50.496140% Input file #0 (test-8192.raw): Input stream #0:0 (audio): 4 packets read (8192 bytes); 4 frames decoded (4096 samples); Total: 4 packets (8192 bytes) demuxed Output file #0 (result.mp4): Output stream #0:0 (audio): 4 frames encoded (4096 samples); 5 packets muxed (1814 bytes); Total: 5 packets (1814 bytes) muxed 4 frames successfully decoded, 0 decoding errors [AVIOContext @ 0x25eb300] Statistics: 0 seeks, 4 writeouts
--> There are 5 packets (and 5 frames) instead of the 4 frames from the input file.
When decoded, this additional packet is a series of 2048 bytes of pure zeros (1024 samples of 0).
However, it does use 536 bytes in the mp4 file. Why such a waste??
Moreover, with empty_moov flag, the mp4 file is seen having a LONGER DURATION by players,
and it triggers 23ms of initial silence when playing the file.
2/ PTS/DTS bug with EMPTY_MOOV on this first packet
Running ffprobe on the result.mp4, the pts/dts seems wrong when using -movflags empty_moov.
# ffprobe -hide_banner -pretty -show_packets result.mp4
WITHOUT empty_moov, the first packet (the empty one with pure zeros) has pts/dts
with negative values, so that the next packet with actual sound starts at 0:00
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'result.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf57.72.100 Duration: 00:00:00.12, start: 0.000000, bitrate: 176 kb/s Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 124 kb/s (default) Metadata: handler_name : SoundHandler [PACKET] codec_type=audio stream_index=0 pts=-1024 pts_time=0:00:-0.023220 dts=-1024 dts_time=0:00:-0.023220 duration=1024 duration_time=0:00:00.023220 convergence_duration=N/A convergence_duration_time=N/A size=536 byte pos=44 flags=KD [SIDE_DATA] side_data_type=Skip Samples skip_samples=1024 discard_padding=0 skip_reason=0 discard_reason=0 [/SIDE_DATA] [/PACKET] [PACKET] codec_type=audio stream_index=0 pts=0 pts_time=0:00:00.000000 dts=0 dts_time=0:00:00.000000 duration=1024 duration_time=0:00:00.023220 ...
But WITH -movflags empty_moov, the first packet starts at pts/dts 0:00, and therefore
mp4 players see a LONGER file, with 23ms of silence at the start:
[PACKET] codec_type=audio stream_index=0 pts=0 pts_time=0:00:00.000000 dts=0 dts_time=0:00:00.000000 duration=N/A duration_time=N/A convergence_duration=N/A convergence_duration_time=N/A size=536 byte pos=849 flags=K_ [/PACKET] [PACKET] codec_type=audio stream_index=0 pts=1024 pts_time=0:00:00.023220 dts=1024 dts_time=0:00:00.023220 duration=1024 duration_time=0:00:00.023220
Here is the detail about my FFmpeg version:
ffmpeg version N-85272-gc901ae9 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-11) configuration: --prefix=/opt/ffmpeg_build --extra-cflags=-I/opt/ffmpeg_build/include --extra-ldflags='-L/opt/ffmpeg_build/lib -ldl' --bindir=/usr/local/bin --pkg-config-flags=--static --enable-gpl --enable-libfreetype libavutil 55. 59.100 / 55. 59.100 libavcodec 57. 91.100 / 57. 91.100 libavformat 57. 72.100 / 57. 72.100 libavdevice 57. 7.100 / 57. 7.100 libavfilter 6. 84.100 / 6. 84.100 libswscale 4. 7.100 / 4. 7.100 libswresample 2. 8.100 / 2. 8.100 libpostproc 54. 6.100 / 54. 6.100
Any help about why there is an additional first packet filled with zeros,
and why the timing turns wrong with empty_moov would be much appreciated!!
Thank you!
Change History (3)
comment:2 by , 7 years ago
Keywords: | mp4 pcm removed |
---|---|
Resolution: | → needs_more_info |
Status: | new → closed |
Please understand that -hide_banner
makes tickets generally invalid, please reopen if you can provide command line including complete, uncut console output for one issue and answer Hendrik's question.
comment:3 by , 7 years ago
Hello @heleppkes,
Thank you so much for your answer, I totally missed the notion of decoder priming indeed. Now it makes totally sense, thanks to your explanation.
The one thing that fooled me is that the Web Audio API decodeAudioData() function does not properly trim this priming data.
Some pointers for others who would end up in the same case:
- Audio priming details on Apple Dev website https://developer.apple.com/library/content/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html
- Open issue of Web Audio API https://github.com/WebAudio/web-audio-api/issues/1091
Big thanks again!
Replying to heleppkes:
AAC uses a concept of decoder priming, ie. before the actual data can be decoded properly, it decodes a bit of nothingness to "start" the decoder. This data is typically 1024 samples long, and thats where you get your extra frame from.
This extra frame is generally properly marked as such and the delay set appropriately, so after decoding you should once again have the same amount of samples as you put in.
AFAIK empty_moov is only really useful for fragmented MP4 streams, what exactly are you trying to achieve by using it alone?
PS:
As a general rule, its greatly advised to report one particular issue in one ticket, and not bunch all your collected issues into one.
AAC uses a concept of decoder priming, ie. before the actual data can be decoded properly, it decodes a bit of nothingness to "start" the decoder. This data is typically 1024 samples long, and thats where you get your extra frame from.
This extra frame is generally properly marked as such and the delay set appropriately, so after decoding you should once again have the same amount of samples as you put in.
AFAIK empty_moov is only really useful for fragmented MP4 streams, what exactly are you trying to achieve by using it alone?
PS:
As a general rule, its greatly advised to report one particular issue in one ticket, and not bunch all your collected issues into one.