Audio ticks and video/audio out-of-sync when using MXF files with pre-charge
|Reported by:||Henk Demper||Owned by:|
|Blocking:||Reproduced by developer:||no|
|Analyzed by developer:||no|
Summary of the bug:
When we convert a MXF file that contains pre-charge frames (Long GOP codec), then the there will be audio ticks at the beginning and/or ending of the resulting output file. As a related problem, the video/audio can be a few frames out-of-sync.
Workflow where this is used:
With our Primestream Production software users make subclips from XDCAM MXF files out of Sony broadcast cameras or recorded with our Ingest software from feeds. These subclips are then (lossless) re-wrapped (with our own tool currently using bmx/libMXF as MXF writer) to MXF files and kept on the storage. Because the source MXF is Long GOP, the MXF subclips will have pre-charge by definition. For parts of our software we then create low-res QuickTime mov or MP4 proxies using ffmpeg command line.
Background on pre-charge in MXF:
As developer of MXF parsers, I believe this has to do because of the way that MXF files stores pre-charge: in MXF files, each frame is stored in a Content Package (CP), both video and audio. If we have pre-charge for Long GOP codecs, then each pre-charge frame is thus stored in it's own CP, together with some mandatory audio frames. These audio frames are never to be heard, they are just added as 'stuffing' to make complete CP packages to satisfy SMPTE ST377. Depending on how the MXF was created, these 'pre-charge audio frames' might contain silence, some original audio (belonging to the never shown pre-charge frames) or could even be random/garbage. The 'Origin' parameter for both the video and audio tracks in the File Source Package are set to the size of the pre-charge. Note that for instance in QuickTime .mov/MP$ files you have (/need) no audio samples for pre-charge video frames, as they are separately addressed & stored per track.
Likely cause of the bug:
What I think is happening is that the ffmpeg MXF parser is correctly handling the video with the pre-charge frames, but when the calling layer requests audio for t=0, it returns from the beginning of the first CP, which is actually the never-to-be-heard/used pre-charge audio CP. Instead, it should take into consideration the Origin parameter of the File Source Package for the audio tracks and start reading audio from that CP instead. With our MXF files, these pre-charge audio frames are silent, hence you hear a tick when the waveform spikes to the real values after the pre-charge audio frames.
Because the MXF starts to read audio at the wrong place ('too early'), video and audio will be out of sync with a difference of the pre-charge: somewhere between 0 and GOP Size frames, depending on the GOP layout.
Quick look at ffmpeg source code:
I see in libavformat\mxfdec.c that MXFSequence::origin is populated, and forwarded to av_dict_set_int() for material/source tracks. I also see some sample_time manipulations inside mxf_read_seek(), but I don't think the origin value is taken into account for audio here... I'm not the author of this code, and not familiar with the design, but happy to further discuss/dive in as needed...
How to reproduce:
% ffmpeg -i input.mxf -vcodec h264 -acodec copy -report output.mov ffmpeg version N-95635-gcae7f6658c-tessus built on: SnapShot from November 5th 2019
I have attached:
- Input MXF file with pre-charge
- Excel sheet for this file showing GOP structure with 12 frames pre-charge
- Screenshot of MXF Analyser showing the 12 frames Origin for the 1st audio track
- Resulting output file QuickTime .mov that has a tick at start and end