Opened 3 years ago

Last modified 3 years ago

#4175 new enhancement

support phase shift for Dolby Pro Logic II / Dolby matrix downmix

Reported by: ranutso Owned by:
Priority: normal Component: swresample
Version: git-master Keywords: dplii
Cc: otonvm Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

Hello guys. I am trying to downmix a 5.1 audio to 2.0 (stereo) using the audio resampler filter "matrix_encoding=dplii". The resulting stereo audio presents some audio level bias towards the right speaker when there is more than one active channel on the surround stream. That is, it seems that the audio volume on the right speaker is higher than it should be. This is, of course, using a test signal where all surround channels produce the same sound at the same volume.

How to reproduce:

% ffmpeg -i ChID-BLITS-EBU-Narration441-16b.wav -ac 2 -filter:a "aresample=matrix_encoding=dplii" dplii.wav
ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg developers
  built on Dec  4 2014 14:13:11 with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)

You may freely obtain the surround test file I used on the command above from Fraunhoffer's website.
Direct link to the surround wave file: https://www2.iis.fraunhofer.de/AAC/ChID-BLITS-EBU-Narration441-16b.wav
Fraunhoffer's page where that file's URL is located: https://www2.iis.fraunhofer.de/AAC/multichannel.html

If you analyze both the surround input file and the resulting stereo wave file on Audacity, for example, you will notice that by 19 seconds into the audio stream all channels are active with a test signal. The same point in time on the resulting DPL-II downmixed shows a "higher volume" on the right speaker. This effect is even more noticeable by 22 seconds when they change the test signal. It is interesting to note that in the initial seconds of the file, when a male's voice is announcing each channel individually, I could not notice this right speaker bias. The channel downmixing in that segment sounds correct. This effect is very noticeable on sound tracks with very active surround channels, they all seem to be stronger on the right speaker.

Thank you.

Change History (11)

comment:1 Changed 3 years ago by cehoyos

  • Keywords downmix surround stereo removed

Please test current FFmpeg git head and please provide the command line you tested together with the complete, uncut console output to make this a valid ticket.

Are you sure that it isn't just a characteristic of Dolby downmixing that the right speaker gets louder if you play back on stereo equipment?

comment:2 follow-up: Changed 3 years ago by otonvm

Hello!

I was just getting ready to write a similar report before I found this one.

I'm doing this (same surround wave):

  1. Extract all channels to mono (as described in the wiki)
    ffmpeg started on 2014-12-17 at 09:21:14
    Report written to "ffmpeg-20141217-092114.log"
    Command line:
    ffmpeg -report -i ChID-BLITS-EBU-Narration441-16b.wav -filter_complex "channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR]" -map "[FL]" test2_front_left.wav -map "[FR]" test2_front_right.wav -map "[FC]" test2_front_center.wav -map "[LFE]" test2_lfe.wav -map "[BL]" test2_back_left.wav -map "[BR]" test2_back_right.wav
    ffmpeg version N-68500-g3ba1050 Copyright (c) 2000-2014 the FFmpeg developers
      built on Dec 17 2014 01:55:42 with gcc 4.9.2 (GCC)
      configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-decklink --enable-zlib
      libavutil      54. 15.100 / 54. 15.100
      libavcodec     56. 15.100 / 56. 15.100
      libavformat    56. 15.105 / 56. 15.105
      libavdevice    56.  3.100 / 56.  3.100
      libavfilter     5.  4.100 /  5.  4.100
      libswscale      3.  1.101 /  3.  1.101
      libswresample   1.  1.100 /  1.  1.100
      libpostproc    53.  3.100 / 53.  3.100
    Splitting the commandline.
    Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
    Reading option '-i' ... matched as input file with argument 'ChID-BLITS-EBU-Narration441-16b.wav'.
    Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument 'channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR]'.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FL]'.
    Reading option 'test2_front_left.wav' ... matched as output file.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FR]'.
    Reading option 'test2_front_right.wav' ... matched as output file.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FC]'.
    Reading option 'test2_front_center.wav' ... matched as output file.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[LFE]'.
    Reading option 'test2_lfe.wav' ... matched as output file.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[BL]'.
    Reading option 'test2_back_left.wav' ... matched as output file.
    Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[BR]'.
    Reading option 'test2_back_right.wav' ... matched as output file.
    Finished splitting the commandline.
    Parsing a group of options: global .
    Applying option report (generate a report) with argument 1.
    Applying option filter_complex (create a complex filtergraph) with argument channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR].
    Successfully parsed a group of options.
    Parsing a group of options: input file ChID-BLITS-EBU-Narration441-16b.wav.
    Successfully parsed a group of options.
    Opening an input file: ChID-BLITS-EBU-Narration441-16b.wav.
    [wav @ 000000000036d0c0] Format wav probed with size=2048 and score=99
    [wav @ 000000000036d0c0] Before avformat_find_stream_info() pos: 68 bytes read:46722 seeks:2
    [wav @ 000000000036d0c0] parser not found for codec pcm_s16le, packets or times may be invalid.
    [wav @ 000000000036d0c0] probing stream 0 pp:14
    [wav @ 000000000036d0c0] probing stream 0 pp:13
    [wav @ 000000000036d0c0] probing stream 0 pp:12
    [wav @ 000000000036d0c0] probing stream 0 pp:11
    [wav @ 000000000036d0c0] probing stream 0 pp:10
    [wav @ 000000000036d0c0] probing stream 0 pp:9
    [wav @ 000000000036d0c0] probing stream 0 pp:8
    [wav @ 000000000036d0c0] probing stream 0 pp:7
    [wav @ 000000000036d0c0] probing stream 0 pp:6
    [wav @ 000000000036d0c0] probing stream 0 pp:5
    [wav @ 000000000036d0c0] probing stream 0 pp:4
    [wav @ 000000000036d0c0] probing stream 0 pp:3
    [wav @ 000000000036d0c0] probing stream 0 pp:2
    [wav @ 000000000036d0c0] probing stream 0 pp:1
    [wav @ 000000000036d0c0] probed stream 0
    [wav @ 000000000036d0c0] parser not found for codec pcm_s16le, packets or times may be invalid.
    [wav @ 000000000036d0c0] All info found
    [wav @ 000000000036d0c0] After avformat_find_stream_info() pos: 204668 bytes read:276098 seeks:2 frames:50
    Input #0, wav, from 'ChID-BLITS-EBU-Narration441-16b.wav':
      Metadata:
        encoder         : Adobe Audition CS6 (Windows)
        date            : 2012-05-15
        creation_time   : 20:53:02
        time_reference  : 0
      Duration: 00:00:46.53, bitrate: 4236 kb/s
        Stream #0:0, 50, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1, s16, 4233 kb/s
    Successfully opened the file.
    Parsing a group of options: output file test2_front_left.wav.
    Applying option map (set input stream mapping) with argument [FL].
    Successfully parsed a group of options.
    Opening an output file: test2_front_left.wav.
    detected 4 logical cores
    [Parsed_channelsplit_0 @ 0000000002bb67a0] Setting 'channel_layout' to value '5.1'
    [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'time_base' to value '1/44100'
    [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'sample_rate' to value '44100'
    [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'sample_fmt' to value 's16'
    [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'channel_layout' to value '0x3f'
    [graph 0 input from stream 0:0 @ 0000000002c12a40] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3f
    [audio format for output stream 0:0 @ 0000000002c05060] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    Parsing a group of options: output file test2_front_right.wav.
    Applying option map (set input stream mapping) with argument [FR].
    Successfully parsed a group of options.
    Opening an output file: test2_front_right.wav.
    [audio format for output stream 1:0 @ 0000000002c1b9a0] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    Parsing a group of options: output file test2_front_center.wav.
    Applying option map (set input stream mapping) with argument [FC].
    Successfully parsed a group of options.
    Opening an output file: test2_front_center.wav.
    [audio format for output stream 2:0 @ 0000000002c1c1c0] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    Parsing a group of options: output file test2_lfe.wav.
    Applying option map (set input stream mapping) with argument [LFE].
    Successfully parsed a group of options.
    Opening an output file: test2_lfe.wav.
    [audio format for output stream 3:0 @ 0000000002c39c80] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    Parsing a group of options: output file test2_back_left.wav.
    Applying option map (set input stream mapping) with argument [BL].
    Successfully parsed a group of options.
    Opening an output file: test2_back_left.wav.
    [audio format for output stream 4:0 @ 0000000002c455e0] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    Parsing a group of options: output file test2_back_right.wav.
    Applying option map (set input stream mapping) with argument [BR].
    Successfully parsed a group of options.
    Opening an output file: test2_back_right.wav.
    [audio format for output stream 5:0 @ 0000000002c50a60] Setting 'sample_fmts' to value 's16'
    Successfully opened the file.
    [Parsed_channelsplit_0 @ 0000000002bb67a0] auto-inserting filter 'auto-inserted resampler 0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_channelsplit_0'
    [audio format for output stream 0:0 @ 0000000002c05060] auto-inserting filter 'auto-inserted resampler 1' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 0:0'
    [audio format for output stream 1:0 @ 0000000002c1b9a0] auto-inserting filter 'auto-inserted resampler 2' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 1:0'
    [audio format for output stream 2:0 @ 0000000002c1c1c0] auto-inserting filter 'auto-inserted resampler 3' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 2:0'
    [audio format for output stream 3:0 @ 0000000002c39c80] auto-inserting filter 'auto-inserted resampler 4' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 3:0'
    [audio format for output stream 4:0 @ 0000000002c455e0] auto-inserting filter 'auto-inserted resampler 5' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 4:0'
    [audio format for output stream 5:0 @ 0000000002c50a60] auto-inserting filter 'auto-inserted resampler 6' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 5:0'
    [AVFilterGraph @ 0000000002c01440] query_formats: 14 queried, 18 merged, 21 already done, 0 delayed
    [auto-inserted resampler 0 @ 0000000002c50b20] ch:6 chl:5.1 fmt:s16 r:44100Hz -> ch:6 chl:5.1 fmt:s16p r:44100Hz
    [auto-inserted resampler 1 @ 0000000002c50be0] ch:1 chl:1 channels (FL) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (FL) fmt:s16 r:44100Hz
    [auto-inserted resampler 2 @ 0000000002c50ca0] ch:1 chl:1 channels (FR) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (FR) fmt:s16 r:44100Hz
    [auto-inserted resampler 3 @ 0000000002c50e20] ch:1 chl:mono fmt:s16p r:44100Hz -> ch:1 chl:mono fmt:s16 r:44100Hz
    [auto-inserted resampler 4 @ 0000000002c50d60] ch:1 chl:1 channels (LFE) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (LFE) fmt:s16 r:44100Hz
    [auto-inserted resampler 5 @ 0000000002c50ee0] ch:1 chl:1 channels (BL) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (BL) fmt:s16 r:44100Hz
    [auto-inserted resampler 6 @ 0000000002c50fa0] ch:1 chl:1 channels (BR) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (BR) fmt:s16 r:44100Hz
    Output #0, wav, to 'test2_front_left.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (FL), s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Output #1, wav, to 'test2_front_right.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #1:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (FR), s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Output #2, wav, to 'test2_front_center.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #2:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Output #3, wav, to 'test2_lfe.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #3:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (LFE), s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Output #4, wav, to 'test2_back_left.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #4:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (BL), s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Output #5, wav, to 'test2_back_right.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #5:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (BR), s16, 705 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Stream mapping:
      Stream #0:0 (pcm_s16le) -> channelsplit
      channelsplit:FL -> Stream #0:0 (pcm_s16le)
      channelsplit:FR -> Stream #1:0 (pcm_s16le)
      channelsplit:FC -> Stream #2:0 (pcm_s16le)
      channelsplit:LFE -> Stream #3:0 (pcm_s16le)
      channelsplit:BL -> Stream #4:0 (pcm_s16le)
      channelsplit:BR -> Stream #5:0 (pcm_s16le)
    Press [q] to stop, [?] for help
    size=    1338kB time=00:00:15.53 bitrate= 705.7kbits/s    
    size=    2494kB time=00:00:28.95 bitrate= 705.6kbits/s    
    [output stream 0:0 @ 0000000002c132e0] EOF on sink link output stream 0:0:default.
    [output stream 2:0 @ 0000000002c30580] EOF on sink link output stream 2:0:default.
    [output stream 5:0 @ 0000000002c509a0] EOF on sink link output stream 5:0:default.
    [output stream 4:0 @ 0000000002c454a0] EOF on sink link output stream 4:0:default.
    [output stream 1:0 @ 0000000002c1b8e0] EOF on sink link output stream 1:0:default.
    [output stream 3:0 @ 0000000002c39bc0] EOF on sink link output stream 3:0:default.
    No more output streams to write to, finishing.
    size=    4008kB time=00:00:46.52 bitrate= 705.6kbits/s    
    video:0kB audio:24045kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
    Input file #0 (ChID-BLITS-EBU-Narration441-16b.wav):
      Input stream #0:0 (audio): 6018 packets read (24621912 bytes); 6018 frames decoded (2051826 samples); 
      Total: 6018 packets (24621912 bytes) demuxed
    Output file #0 (test2_front_left.wav):
      Output stream #0:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    Output file #1 (test2_front_right.wav):
      Output stream #1:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    Output file #2 (test2_front_center.wav):
      Output stream #2:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    Output file #3 (test2_lfe.wav):
      Output stream #3:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    Output file #4 (test2_back_left.wav):
      Output stream #4:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    Output file #5 (test2_back_right.wav):
      Output stream #5:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); 
      Total: 6018 packets (4103652 bytes) muxed
    6018 frames successfully decoded, 0 decoding errors
    [AVIOContext @ 0000000002c1a4c0] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 0000000002c2f600] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 0000000002c1c540] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 0000000002c3b000] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 0000000002c45980] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 0000000002c46fc0] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 000000000036d780] Statistics: 24682588 bytes read, 2 seeks
    
  2. Mix mono channels with Surcode (dolby certified) to use as reference. I've used this encoder before and I'm quite confident that produces correct results. Surcode downmix (This is on my mega account, should I put it somewhere else?)

Also if muxed together with the video that identifies channels it sounds right. The sound stage is centered both on stereo and on 5.1.
It should be centered on stereo because DPLII is supposed to be for both stereo and 5.1 delivery.

  1. I get an almost identical result (poins of dB in difference) when using qaac with this matrix:
    1 0 0.7071067811865476 0 -0.8717797887081347j -0.4898979485566356j
    0 1 0.7071067811865476 0 0.4898979485566356j 0.8717797887081347j
    

This numbers are from Wikipedia.

%ffmpeg% -loglevel quiet -i ChID-BLITS-EBU-Narration441-16b.wav -f wav -y - | %qaac% --tvbr 127 --quality 2 --verbose --native-resampler=bats,127 --matrix-preset=dpl2 --no-matrix-normalize - -o ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.m4a
%ffmpeg% -i ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.m4a -y ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.wav

Result

  1. For reference, this is the result from aresample:
    ffmpeg started on 2014-12-17 at 10:37:23
    Report written to "ffmpeg-20141217-103723.log"
    Command line:
    ffmpeg -report -i ChID-BLITS-EBU-Narration441-16b.wav -af "aresample=out_channel_layout=stereo:matrix_encoding=dplii" -y ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav
    ffmpeg version N-68500-g3ba1050 Copyright (c) 2000-2014 the FFmpeg developers
      built on Dec 17 2014 01:55:42 with gcc 4.9.2 (GCC)
      configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-decklink --enable-zlib
      libavutil      54. 15.100 / 54. 15.100
      libavcodec     56. 15.100 / 56. 15.100
      libavformat    56. 15.105 / 56. 15.105
      libavdevice    56.  3.100 / 56.  3.100
      libavfilter     5.  4.100 /  5.  4.100
      libswscale      3.  1.101 /  3.  1.101
      libswresample   1.  1.100 /  1.  1.100
      libpostproc    53.  3.100 / 53.  3.100
    Splitting the commandline.
    Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
    Reading option '-i' ... matched as input file with argument 'ChID-BLITS-EBU-Narration441-16b.wav'.
    Reading option '-af' ... matched as option 'af' (set audio filters) with argument 'aresample=out_channel_layout=stereo:matrix_encoding=dplii'.
    Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
    Reading option 'ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav' ... matched as output file.
    Finished splitting the commandline.
    Parsing a group of options: global .
    Applying option report (generate a report) with argument 1.
    Applying option y (overwrite output files) with argument 1.
    Successfully parsed a group of options.
    Parsing a group of options: input file ChID-BLITS-EBU-Narration441-16b.wav.
    Successfully parsed a group of options.
    Opening an input file: ChID-BLITS-EBU-Narration441-16b.wav.
    [wav @ 000000000032bf60] Format wav probed with size=2048 and score=99
    [wav @ 000000000032bf60] Before avformat_find_stream_info() pos: 68 bytes read:46722 seeks:2
    [wav @ 000000000032bf60] parser not found for codec pcm_s16le, packets or times may be invalid.
    [wav @ 000000000032bf60] probing stream 0 pp:14
    [wav @ 000000000032bf60] probing stream 0 pp:13
    [wav @ 000000000032bf60] probing stream 0 pp:12
    [wav @ 000000000032bf60] probing stream 0 pp:11
    [wav @ 000000000032bf60] probing stream 0 pp:10
    [wav @ 000000000032bf60] probing stream 0 pp:9
    [wav @ 000000000032bf60] probing stream 0 pp:8
    [wav @ 000000000032bf60] probing stream 0 pp:7
    [wav @ 000000000032bf60] probing stream 0 pp:6
    [wav @ 000000000032bf60] probing stream 0 pp:5
    [wav @ 000000000032bf60] probing stream 0 pp:4
    [wav @ 000000000032bf60] probing stream 0 pp:3
    [wav @ 000000000032bf60] probing stream 0 pp:2
    [wav @ 000000000032bf60] probing stream 0 pp:1
    [wav @ 000000000032bf60] probed stream 0
    [wav @ 000000000032bf60] parser not found for codec pcm_s16le, packets or times may be invalid.
    [wav @ 000000000032bf60] All info found
    [wav @ 000000000032bf60] After avformat_find_stream_info() pos: 204668 bytes read:276098 seeks:2 frames:50
    Input #0, wav, from 'ChID-BLITS-EBU-Narration441-16b.wav':
      Metadata:
        encoder         : Adobe Audition CS6 (Windows)
        date            : 2012-05-15
        creation_time   : 20:53:02
        time_reference  : 0
      Duration: 00:00:46.53, bitrate: 4236 kb/s
        Stream #0:0, 50, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1, s16, 4233 kb/s
    Successfully opened the file.
    Parsing a group of options: output file ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav.
    Applying option af (set audio filters) with argument aresample=out_channel_layout=stereo:matrix_encoding=dplii.
    Successfully parsed a group of options.
    Opening an output file: ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav.
    Successfully opened the file.
    detected 4 logical cores
    [Parsed_aresample_0 @ 00000000003296a0] Setting 'out_channel_layout' to value 'stereo'
    [Parsed_aresample_0 @ 00000000003296a0] Setting 'matrix_encoding' to value 'dplii'
    [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'time_base' to value '1/44100'
    [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'sample_rate' to value '44100'
    [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'sample_fmt' to value 's16'
    [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'channel_layout' to value '0x3f'
    [graph 0 input from stream 0:0 @ 000000000032be20] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3f
    [audio format for output stream 0:0 @ 0000000002bb4080] Setting 'sample_fmts' to value 's16'
    [AVFilterGraph @ 0000000002bbbae0] query_formats: 4 queried, 9 merged, 0 already done, 0 delayed
    0.325401 0.000000 0.230093 0.000000 -0.281805 -0.162700 
    0.000000 0.325401 0.230093 0.000000 0.162700 0.281805 
    [Parsed_aresample_0 @ 00000000003296a0] ch:6 chl:5.1 fmt:s16 r:44100Hz -> ch:2 chl:stereo fmt:s16 r:44100Hz
    Output #0, wav, to 'ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav':
      Metadata:
        time_reference  : 0
        ICRD            : 2012-05-15
        ISFT            : Lavf56.15.105
        Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
        Metadata:
          encoder         : Lavc56.15.100 pcm_s16le
    Stream mapping:
      Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
    Press [q] to stop, [?] for help
    [output stream 0:0 @ 0000000002bb3fc0] EOF on sink link output stream 0:0:default.
    No more output streams to write to, finishing.
    size=    8015kB time=00:00:46.52 bitrate=1411.2kbits/s    
    video:0kB audio:8015kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001194%
    Input file #0 (ChID-BLITS-EBU-Narration441-16b.wav):
      Input stream #0:0 (audio): 6018 packets read (24621912 bytes); 6018 frames decoded (2051826 samples); 
      Total: 6018 packets (24621912 bytes) demuxed
    Output file #0 (ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav):
      Output stream #0:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (8207304 bytes); 
      Total: 6018 packets (8207304 bytes) muxed
    6018 frames successfully decoded, 0 decoding errors
    [AVIOContext @ 000000000032d6a0] Statistics: 4 seeks, 6021 writeouts
    [AVIOContext @ 000000000032c7e0] Statistics: 24682588 bytes read, 2 seeks
    

Result

Let me know if you need anything else.

comment:3 in reply to: ↑ 2 Changed 3 years ago by cehoyos

  • Cc otonvm added
  • Version changed from 2.4.4 to git-master

Replying to otonvm:

I was just getting ready to write a similar report before I found this one.

Ticket #3455 probably deals with the same issue and contains a link to a patch that you could test.

comment:4 follow-up: Changed 3 years ago by otonvm

Done!
Did a build from git and if works!
Result

The mix is overall quieter but otherwise matches reference.

Since it's not a huge patch how long usually before approval?

comment:5 in reply to: ↑ 4 Changed 3 years ago by cehoyos

Replying to otonvm:

The mix is overall quieter but otherwise matches reference.

Since it's not a huge patch how long usually before approval?

It is not a question of time but your email to the development mailing list confirming that you tested the patch (and compared it with a reference encoder) and that you believe it is correct.

comment:6 Changed 3 years ago by heleppkes

Just to get some clarity..
You quoted a matrix you used for qaac, and you said this matrix gives you a good result, right?

This quoted matrix has two negative operations for Lt, and two positive operations for Rt, just like the swresample code (BEFORE the patch!).

If anything, the coefficients may be slightly wrong, but the +/- are in the proper places from what I can tell.

The only limitation is that swresample cannot actually apply the phase shift the DPLII spec asks for, so the result will never be perfect.

Last edited 3 years ago by heleppkes (previous) (diff)

comment:7 Changed 3 years ago by michael

  • Summary changed from Dolby Pro Logic II / Dolby matrix downmixing level balance to support phase shift for Dolby Pro Logic II / Dolby matrix downmix
  • Type changed from defect to enhancement

comment:8 Changed 3 years ago by otonvm

Please check all the samples provided, including the final post-patch result.
Put them into Audacity or similar and even visually the differences are obvious.
Then listen by swithing from solo to solo while playing eash piece on a loop.

Post-patch the difference with reference and/or qaac matrix are minimal at best.

I actually cannot explain what happens... I tried to fix this before I found this patch and could not mostly because I did not know how to ricreate that phase shift.

comment:9 Changed 3 years ago by heleppkes

Its possible that the missing phase shift causes the difference you are observing, however blindly adjusting the formula to something that "feels" good in one particular circumstance is not the way to go here.

Any reference and independent sources I can find suggest that the current code is correct, albeit missing the phase shift. It also matches your qaac matrix (sans phase shift).

So we know its incomplete and therefor results in "wrong" output, no reason to actually "break" it more (by diversing the formula from the references), imho. The mixer in swresample (and avresample for that matter) is only a simple matrix mixer, until someone teaches it how to apply the phase shift, any modifications to the DPL/DPLII mixing are rather pointless.

Fact of the matter is, no change to the matrix coefficients will give you actual proper DPLII encoded audio.

comment:10 Changed 3 years ago by otonvm

I'm sorry but I must disagree.
And by just listening to those samples you could easily hear the difference yourself.

I belive whatever Surcode produces is a reliable reference, my qaac matrix matches it (therefore also what's noted on wikipedia) but most importantly it sounds right.

Any other audio that identifies each channel and mixes between each channel produces the same results.

comment:11 Changed 3 years ago by Peter

I wouldn't recommend using the patch, I didn't realise about the phase shifting when I wrote it (It will stop the waveform cancellation, but also get steered to the wrong channel).

To quote some documentation from Dolby:

"The 90-Degree Phase Shift filter provides a means for an encoding engineer to create a multichannel Dolby Digital bitstream that can be downmixed to a Dolby Surround compatible Lt/Rt? output. Without this filter, point-source elements panned from Surround to Center in the multichannel mix would seem to pan from Surround to Left and then to Center when downmixed to Lt/Rt? and reproduced using a Dolby Surround Pro Logic decoder.

This filter should generally be used whenever encoding a multichannel signal unless it is known that the 5.1-channel source does not contain point-source element pans. For example, if the source was recorded using five discrete microphones placed in the corners of an auditorium, there is no panning between channels and the filter could be safely disabled. If in doubt, use a DP562 to downmix the 5.1-channel program to Lt/Rt?, Dolby Surround Pro Logic decode the Lt/Rt? signals, and then set the filter to the setting that sounds best."

This is pretty much what happened in #3455. Basically, as heleppkes says, it will not give you proper encoded audio even if the coefficients are corrected (I do think they're off too).

Note: See TracTickets for help on using tickets.