Opened 2 years ago

Closed 23 months ago

Last modified 23 months ago

#6343 closed defect (needs_more_info)

Encoding MP4/AAC audio from pcm: issues with packets, duration, and pts/dts (especially when using -movflags empty_moov)

Reported by: ea167 Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: aac
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Hello,

I've encountered several issues trying to encode audio PCM into MP4/AAC.
I've recompiled the latest nightly to make sure it was not already solved.
Seem related to #2325, though not exactly the same.

Here are the FFmpeg command line I ran to encode a 8192 bytes of raw s16le PCM file (4096 samples) into MP4/AAC:

ffmpeg -nostdin -hide_banner -loglevel debug \
 -f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \
 -f mp4 -acodec aac -movflags empty_moov -ac 1 -ar 44100 -b:a 128000 \
 result.mp4 

(same without empty_moov)

ffmpeg -nostdin -hide_banner -loglevel debug \
 -f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \
 -f mp4 -acodec aac -ac 1 -ar 44100 -b:a 128000 \
 result.mp4 

1/ Why is there an empty packet added to the MP4?

When I run ffmpeg, I get the following logs:

video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 50.496140%
Input file #0 (test-8192.raw):
  Input stream #0:0 (audio): 4 packets read (8192 bytes); 4 frames decoded (4096 samples); 
  Total: 4 packets (8192 bytes) demuxed
Output file #0 (result.mp4):
  Output stream #0:0 (audio): 4 frames encoded (4096 samples); 5 packets muxed (1814 bytes); 
  Total: 5 packets (1814 bytes) muxed
4 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x25eb300] Statistics: 0 seeks, 4 writeouts

--> There are 5 packets (and 5 frames) instead of the 4 frames from the input file.

When decoded, this additional packet is a series of 2048 bytes of pure zeros (1024 samples of 0).

However, it does use 536 bytes in the mp4 file. Why such a waste??

Moreover, with empty_moov flag, the mp4 file is seen having a LONGER DURATION by players,
and it triggers 23ms of initial silence when playing the file.


2/ PTS/DTS bug with EMPTY_MOOV on this first packet

Running ffprobe on the result.mp4, the pts/dts seems wrong when using -movflags empty_moov.

# ffprobe -hide_banner -pretty -show_packets result.mp4

WITHOUT empty_moov, the first packet (the empty one with pure zeros) has pts/dts
with negative values, so that the next packet with actual sound starts at 0:00

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'result.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf57.72.100
  Duration: 00:00:00.12, start: 0.000000, bitrate: 176 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 124 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
[PACKET]
codec_type=audio
stream_index=0
pts=-1024
pts_time=0:00:-0.023220
dts=-1024
dts_time=0:00:-0.023220
duration=1024
duration_time=0:00:00.023220
convergence_duration=N/A
convergence_duration_time=N/A
size=536 byte
pos=44
flags=KD
[SIDE_DATA]
side_data_type=Skip Samples
skip_samples=1024
discard_padding=0
skip_reason=0
discard_reason=0
[/SIDE_DATA]
[/PACKET]
[PACKET]
codec_type=audio
stream_index=0
pts=0
pts_time=0:00:00.000000
dts=0
dts_time=0:00:00.000000
duration=1024
duration_time=0:00:00.023220
...

But WITH -movflags empty_moov, the first packet starts at pts/dts 0:00, and therefore
mp4 players see a LONGER file, with 23ms of silence at the start:

[PACKET]
codec_type=audio
stream_index=0
pts=0
pts_time=0:00:00.000000
dts=0
dts_time=0:00:00.000000
duration=N/A
duration_time=N/A
convergence_duration=N/A
convergence_duration_time=N/A
size=536 byte
pos=849
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=0
pts=1024
pts_time=0:00:00.023220
dts=1024
dts_time=0:00:00.023220
duration=1024
duration_time=0:00:00.023220

Here is the detail about my FFmpeg version:

ffmpeg version N-85272-gc901ae9 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-11)
  configuration: --prefix=/opt/ffmpeg_build --extra-cflags=-I/opt/ffmpeg_build/include --extra-ldflags='-L/opt/ffmpeg_build/lib -ldl' --bindir=/usr/local/bin --pkg-config-flags=--static --enable-gpl --enable-libfreetype
  libavutil      55. 59.100 / 55. 59.100
  libavcodec     57. 91.100 / 57. 91.100
  libavformat    57. 72.100 / 57. 72.100
  libavdevice    57.  7.100 / 57.  7.100
  libavfilter     6. 84.100 /  6. 84.100
  libswscale      4.  7.100 /  4.  7.100
  libswresample   2.  8.100 /  2.  8.100
  libpostproc    54.  6.100 / 54.  6.100

Any help about why there is an additional first packet filled with zeros,
and why the timing turns wrong with empty_moov would be much appreciated!!

Thank you!

Change History (3)

comment:1 follow-up: Changed 2 years ago by heleppkes

AAC uses a concept of decoder priming, ie. before the actual data can be decoded properly, it decodes a bit of nothingness to "start" the decoder. This data is typically 1024 samples long, and thats where you get your extra frame from.

This extra frame is generally properly marked as such and the delay set appropriately, so after decoding you should once again have the same amount of samples as you put in.

AFAIK empty_moov is only really useful for fragmented MP4 streams, what exactly are you trying to achieve by using it alone?

PS:
As a general rule, its greatly advised to report one particular issue in one ticket, and not bunch all your collected issues into one.

Last edited 2 years ago by heleppkes (previous) (diff)

comment:2 Changed 23 months ago by cehoyos

  • Keywords mp4 pcm removed
  • Resolution set to needs_more_info
  • Status changed from new to closed

Please understand that -hide_banner makes tickets generally invalid, please reopen if you can provide command line including complete, uncut console output for one issue and answer Hendrik's question.

comment:3 in reply to: ↑ 1 Changed 23 months ago by ea167

Hello @heleppkes,

Thank you so much for your answer, I totally missed the notion of decoder priming indeed. Now it makes totally sense, thanks to your explanation.

The one thing that fooled me is that the Web Audio API decodeAudioData() function does not properly trim this priming data.

Some pointers for others who would end up in the same case:

  1. Audio priming details on Apple Dev website https://developer.apple.com/library/content/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html
  2. Open issue of Web Audio API https://github.com/WebAudio/web-audio-api/issues/1091

Big thanks again!

Replying to heleppkes:

AAC uses a concept of decoder priming, ie. before the actual data can be decoded properly, it decodes a bit of nothingness to "start" the decoder. This data is typically 1024 samples long, and thats where you get your extra frame from.

This extra frame is generally properly marked as such and the delay set appropriately, so after decoding you should once again have the same amount of samples as you put in.

AFAIK empty_moov is only really useful for fragmented MP4 streams, what exactly are you trying to achieve by using it alone?

PS:
As a general rule, its greatly advised to report one particular issue in one ticket, and not bunch all your collected issues into one.

Note: See TracTickets for help on using tickets.