Opened 3 weeks ago

#7182 new defect

Asynchronity when muxing Opus in Matroska

Reported by: mkver Owned by:
Priority: normal Component: avformat
Version: git-master Keywords: mkv, opus
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Muxing opus in Matroska currently leads to asynchronity because the muxer doesn't account for the fact that Matroska's CodecDelay? element already contains an implicit delay.
Before turning to the more explicit explanation, let me say that I used this version of ffmpeg (latest version of Zeranoe's builds, still from today; I'm declaring the version to be git-master although git-master is ahead by one completely unrelated commit):

ffmpeg version N-90920-ge07b1913fc Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.3.0 (GCC)
  configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
  libavutil      56. 18.100 / 56. 18.100
  libavcodec     58. 19.100 / 58. 19.100
  libavformat    58. 13.100 / 58. 13.100
  libavdevice    58.  4.100 / 58.  4.100
  libavfilter     7. 21.100 /  7. 21.100
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
  libpostproc    55.  2.100 / 55.  2.100

The nullsrc and anullsrc filter create tracks whose timestamps (both pts and dts) start at zero:

ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -t 0.2 -f framehash -hash crc32 -

#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/25
#media_type 0: video
#codec_id 0: rawvideo
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/44100
#media_type 1: audio
#codec_id 1: pcm_s16le
#sample_rate 1: 44100
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts,        pts, duration,     size, hash
0,          0,          0,        1,   115200, 2a01c517
1,          0,          0,     1024,     4096, c71c0011
1,       1024,       1024,     1024,     4096, c71c0011
0,          1,          1,        1,   115200, 2a01c517
1,       2048,       2048,     1024,     4096, c71c0011
1,       3072,       3072,     1024,     4096, c71c0011
0,          2,          2,        1,   115200, 2a01c517
1,       4096,       4096,     1024,     4096, c71c0011
1,       5120,       5120,     1024,     4096, c71c0011
0,          3,          3,        1,   115200, 2a01c517
1,       6144,       6144,     1024,     4096, c71c0011
0,          4,          4,        1,   115200, 2a01c517
1,       7168,       7168,     1024,     4096, c71c0011
1,       8192,       8192,      628,     2512, 3f99da8d

If one encodes the audio, the pts and dts of the audio are shifted by the amount of samples of encoder delay the encoding process entails so that the output audio that actually corresponds to input samples has the same timestamps as the corresponding input samples:

ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -c:a libopus -t 0.5 -f framehash -hash crc32 -

#format: frame checksums
#version: 2
#hash: CRC32
#extradata 1,                              19, ea5d642a
#software: Lavf58.13.100
#tb 0: 1/25
#media_type 0: video
#codec_id 0: rawvideo
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/48000
#media_type 1: audio
#codec_id 1: opus
#sample_rate 1: 48000
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts,        pts, duration,     size, hash
1,       -312,       -312,      960,        3, 8abe71cf
0,          0,          0,        1,   115200, 2a01c517
1,        648,        648,      960,        3, 8abe71cf
1,       1608,       1608,      960,        3, 8abe71cf
0,          1,          1,        1,   115200, 2a01c517
1,       2568,       2568,      960,        3, 8abe71cf
1,       3528,       3528,      960,        3, 8abe71cf
0,          2,          2,        1,   115200, 2a01c517
1,       4488,       4488,      960,        3, 8abe71cf
1,       5448,       5448,      960,        3, 8abe71cf
0,          3,          3,        1,   115200, 2a01c517
1,       6408,       6408,      960,        3, 8abe71cf
1,       7368,       7368,      960,        3, 8abe71cf
0,          4,          4,        1,   115200, 2a01c517
1,       8328,       8328,      960,        3, 8abe71cf
1,       9288,       9288,      960,        3, 8abe71cf
0,          5,          5,        1,   115200, 2a01c517
1,      10248,      10248,      960,        3, 8abe71cf
1,      11208,      11208,      960,        3, 8abe71cf
0,          6,          6,        1,   115200, 2a01c517
1,      12168,      12168,      960,        3, 8abe71cf
1,      13128,      13128,      960,        3, 8abe71cf
0,          7,          7,        1,   115200, 2a01c517
1,      14088,      14088,      960,        3, 8abe71cf
1,      15048,      15048,      960,        3, 8abe71cf
0,          8,          8,        1,   115200, 2a01c517
1,      16008,      16008,      960,        3, 8abe71cf
1,      16968,      16968,      960,        3, 8abe71cf
0,          9,          9,        1,   115200, 2a01c517
1,      17928,      17928,      960,        3, 8abe71cf
1,      18888,      18888,      960,        3, 8abe71cf
0,         10,         10,        1,   115200, 2a01c517
1,      19848,      19848,      960,        3, 8abe71cf
1,      20808,      20808,      960,        3, 8abe71cf
0,         11,         11,        1,   115200, 2a01c517
1,      21768,      21768,      960,        3, 8abe71cf
1,      22728,      22728,      960,        3, 8abe71cf
0,         12,         12,        1,   115200, 2a01c517
1,      23688,      23688,      312,        3, 8abe71cf, S=1,       10, 6ba9ada3

If one now muxes this into Matroska (in order to use a valid codec in Matroska, I encoded the video with libx264 and -tune zerolatency in order not to run into #4536), the -312 samples (6.5ms) encoder delay from above lead to a shift of all timestamps by the same amount to make all timestamps non-negative; this happens with every audio codec and is not Opus-specific:

ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -c:v libx264 -tune zerolatency -c:a libopus -t 0.5 -f matroska test.mkv
mkvinfo -s test.mkv
Track 1: video, codec ID: V_MPEG4/ISO/AVC (h.264 profile: High @L1.3), mkvmerge/mkvextract track ID: 0, language: und, default duration: 40.000ms (25.000 frames/fields per second for a video track), pixel width: 320, pixel height: 240
Track 2: audio, codec ID: A_OPUS, mkvmerge/mkvextract track ID: 1, language: und, channels: 2, sampling freq: 48000, bits per sample: 16
I frame, track 2, timestamp 00:00:00.000000000, size 3, adler 0x05f302fa
I frame, track 1, timestamp 00:00:00.007000000, size 812, adler 0x080a17e4
I frame, track 2, timestamp 00:00:00.021000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.041000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.047000000, size 51, adler 0xa07a11ec
I frame, track 2, timestamp 00:00:00.061000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.081000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.087000000, size 61, adler 0x76721649
I frame, track 2, timestamp 00:00:00.101000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.121000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.127000000, size 65, adler 0x23a11875
I frame, track 2, timestamp 00:00:00.141000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.161000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.167000000, size 65, adler 0x249f181b
I frame, track 2, timestamp 00:00:00.181000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.201000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.207000000, size 65, adler 0x334918bd
I frame, track 2, timestamp 00:00:00.221000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.241000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.247000000, size 65, adler 0x34021860
I frame, track 2, timestamp 00:00:00.261000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.281000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.287000000, size 65, adler 0x42ac1902
I frame, track 2, timestamp 00:00:00.301000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.321000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.327000000, size 65, adler 0x085c17a3
I frame, track 2, timestamp 00:00:00.341000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.361000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.367000000, size 65, adler 0x17061845
I frame, track 2, timestamp 00:00:00.381000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.401000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.407000000, size 65, adler 0x180417eb
I frame, track 2, timestamp 00:00:00.421000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.441000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.447000000, size 65, adler 0x2669188a
I frame, track 2, timestamp 00:00:00.461000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.481000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.487000000, size 65, adler 0x2722182d
I frame, track 2, timestamp 00:00:00.501000000, size 3, adler 0x05f302fa

So the encoder delay gets backed into the usual timestamps. But for Opus the encoding delay also gets signalled via the CodecDelay? element in the Opus track header. The semantics of this field imply that the first 6.5ms of audio should be discarded and that the audio for time t has Matroska time t+6.5ms (i.e. the second opus block at 20ms actually has a timestamp of 13.5ms). This means that the synchronization of the opus track and the other tracks shifted by the encoder delay as can be seen e.g. in the output of the Matroska demuxer:

ffmpeg.exe -copyts -i test.mkv -c copy -f framehash -hash crc32 -

#format: frame checksums
#version: 2
#hash: CRC32
#extradata 0,                              40, 8237cd92
#extradata 1,                              19, ea5d642a
#software: Lavf58.13.100
#tb 0: 1/1000
#media_type 0: video
#codec_id 0: h264
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/1000
#media_type 1: audio
#codec_id 1: opus
#sample_rate 1: 48000
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts,        pts, duration,     size, hash
1,         -7,         -7,       20,        3, 8abe71cf
0,          7,          7,       40,      812, dbac8e3e
1,         14,         14,       20,        3, 8abe71cf
1,         34,         34,       20,        3, 8abe71cf
0,         47,         47,       40,       51, 4885e758
1,         54,         54,       20,        3, 8abe71cf
1,         74,         74,       20,        3, 8abe71cf
0,         87,         87,       40,       61, 5c29c696
1,         94,         94,       20,        3, 8abe71cf
1,        114,        114,       20,        3, 8abe71cf
0,        127,        127,       40,       65, 2832137b
1,        134,        134,       20,        3, 8abe71cf
1,        154,        154,       20,        3, 8abe71cf
0,        167,        167,       40,       65, 985e3247
1,        174,        174,       20,        3, 8abe71cf
1,        194,        194,       20,        3, 8abe71cf
0,        207,        207,       40,       65, 85567570
1,        214,        214,       20,        3, 8abe71cf
1,        234,        234,       20,        3, 8abe71cf
0,        247,        247,       40,       65, c623be44
1,        254,        254,       20,        3, 8abe71cf
1,        274,        274,       20,        3, 8abe71cf
0,        287,        287,       40,       65, db2bf973
1,        294,        294,       20,        3, 8abe71cf
1,        314,        314,       20,        3, 8abe71cf
0,        327,        327,       40,       65, 49d46f1e
1,        334,        334,       20,        3, 8abe71cf
1,        354,        354,       20,        3, 8abe71cf
0,        367,        367,       40,       65, 54dc2829
1,        374,        374,       20,        3, 8abe71cf
1,        394,        394,       20,        3, 8abe71cf
0,        407,        407,       40,       65, 584b1113
1,        414,        414,       20,        3, 8abe71cf
1,        434,        434,       20,        3, 8abe71cf
0,        447,        447,       40,       65, 0aa1a42a
1,        454,        454,       20,        3, 8abe71cf
1,        474,        474,       20,        3, 8abe71cf
0,        487,        487,       40,       65, f52f7718
1,        494,        494,       20,        3, 8abe71cf, S=1,       10, 6ba9ada3

(Without -copyts the timestamps would be shifted to make them non-negative.)
As one sees, this is essentially a shift by the encoder delay. If one makes roundtrips demuxer->muxer, the tracks get ever more out of sync.

Change History (0)

Note: See TracTickets for help on using tickets.