Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#8631 closed defect (needs_more_info)

Audio gapless playback metadata for MP4/AAC

Reported by: John Kaplan Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Background

Gapless playback of audio tracks is enabled by many standard audio players in mobile devices. It allows successive audio tracks to play without pause or perceptible audio flaw as a seamless whole. Gapless playback is required for listeners to hear many live, classical, and classic rock recordings as intended. External sources describing gapless playback in detail include https://en.wikipedia.org/wiki/Gapless_playback and https://wiki.hydrogenaud.io/index.php?title=Gapless_playback.

The above references describe several theoretical sources for gaps to be introduced among varying electronic audio formats, but the prevalent sources are lossy compression technologies such as MP3 and AAC, which introduce extra samples before and after the original PCM data of an audio track as a part of their encoding processes. Because the length of the extra data can vary, and metadata describing its length is not included in these compression standards, it cannot naturally be stripped away as a part of the decoding process.

But the packaging technology can access the pertinent data from the audio encoder, and include it in file metadata for the audio players to access. This is what ffmpeg can do. The samples added to the front of the audio are called "delay" and the samples added to the end are called "padding." For the audio players to strip off the extra samples to get to a gapless audio track, pertinent values are the lengths in samples of: the delay, the original unpadded PCM audio samples, and the padding.

As far as I know, there is no documented standard specifically for gapless playback metadata to be encoded in an audio file and interpreted by audio players. (If anyone has any inside information to the contrary, please include as a comment - I and a lot of others would be grateful for the insight.) But many audio players apparently follow de facto standards which involve reading metadata from the file headers that provide enough information about track length and delay to frame the original unpadded audio track.

Proposed Solution
The solution that this bug request proposes first for ffmpeg applies to AAC audio packaged in an MP4 file. The proposal is to adapt the moov/edts/elst atoms as described by iso14496-12 to add a single elst atom inside a single edts atom per track. Then inside this elst, to write the count of the unpadded audio PCM samples as the "track duration"/"time length" field, and the count of the delay samples as the "start time"/"media time" field. Audio players use these to skip over the delay samples within the provided track data, isolate the original PCM audio samples, and ignore the padding at the end, so the padding length is not explicitly included in the metadata. My team has experimented with audio tracks processed this way using the fdk-aac tool, and they play gaplessly on both iOS and Android standard audio players.

Tech Details
Here are some issues about the design & coding of this request. I'm hoping the community will jump in and comment to help me nail down the details so I can move on to coding a patch.

As several of you have commented before, there is currently some code in ffmpeg that produces an elst atom, controlled by a command-line switch "use-editlist."

I believe that this use of an edit list for movie synchronization is a different use case than its use for gapless audio. Of course if any of you can set me straight on how a single routine could cover both use cases, I'll attempt to satisfy that requirement. But short of that I propose to add a second command-line switch "gapless-editlist" that will peacefully co-exist, but be mutually exclusive with the current switch, and will control the emission of an elst atom with gapless metadata. Whether that's the ultimate shape of the best solution or not, at least for the time being it will avoid regressions for users relying on the current implementation.

One detail I need help on is how to locate the sample lengths of the encoder delay and original PCM audio samples for an AAC encoding in the data available to the atom-writing functions in movenc.c. (i.e. something accessible from (AVIOContext *pb, MOVMuxContext *mov, MOVTrack *trac) )

When we discussed previously, Martin Balint suggested I look in the side data AV_PKT_DATA_SKIP_SAMPLES for the delay value, and I found several references to that variable for different encoding formats. Unfortunately it seems to be used differently for each encoding format, and I don't know how to locate the same or equivalent data for AAC encoding. I'm also not sure of the interface between the fdk-aac module and ffmpeg. I will keep digging and hope I eventually find it, but if any maintainers out there have direct knowledge to point me to some code or a data structure definition, I'd be most grateful.

As to a more general solution that works with other encoders, I'm game to help out with that once the MP4/AAC case is done. I'll keep experimenting/investigating while waiting for responses. I apologize for the long time since I discussed this before, but it's up to the top of my spare-time priority list now so I'm actively working it.

Thanks,
John

Change History (2)

comment:1 by Carl Eugen Hoyos, 4 years ago

Keywords: gapless audio AAC removed
Resolution: needs_more_info
Status: newclosed

Please reopen this ticket if you can provide the command line you tested together with the complete, uncut console output and the necessary input samples that allow to reproduce the issue you see.
You don't need to analyze an issue that you report on this bug tracker but you have to explain how it can be reproduced.

Last edited 4 years ago by Carl Eugen Hoyos (previous) (diff)

comment:2 by John Kaplan, 4 years ago

I did more investigation and found that for re-encoding a flac file into MP4/AAC, ffmpeg is producing a moov/edts/elst header with information populated that follows the MP4 spec.
We were using a command-line switch that suppressed the above boxes, which was
-f adts
..once we stopped using that the elst box appeared in the output. There is a wrinkle in that output that's preventing some players from interpreting it correctly, but that's for another thread.

Note: See TracTickets for help on using tickets.