Opened 4 years ago

Closed 3 years ago

#4564 closed defect (wontfix)

Distortion after flt stereo downmix

Reported by: cehoyos Owned by:
Priority: normal Component: swresample
Version: git-master Keywords:
Cc: heleppkes Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

http://thread.gmane.org/gmane.comp.video.ffmpeg.user/56969
A user uploaded a short 5.1 dts sample that plays fine with a hardware decoder but leads to "significant distortion" if downmixed to stereo with floating point coefficients. No such distortion can be heard if s16 or s32 coefficients are used. The issue subsequently leads to distortion when downmixing and encoding with an encoder that offers float capability like ac3 and lame.

$ ffmpeg -loglevel debug -i inter.dts -acodec pcm_f32le -ac 2 out.wav
ffmpeg version N-72193-g14c4b25 Copyright (c) 2000-2015 the FFmpeg developers
  built with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      54. 23.101 / 54. 23.101
  libavcodec     56. 39.101 / 56. 39.101
  libavformat    56. 33.101 / 56. 33.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 16.101 /  5. 16.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Splitting the commandline.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-i' ... matched as input file with argument 'inter.dts'.
Reading option '-acodec' ... matched as option 'acodec' (force audio codec ('copy' to copy stream)) with argument 'pcm_f32le'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '2'.
Reading option 'out.wav' ... matched as output file.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input file inter.dts.
Successfully parsed a group of options.
Opening an input file: inter.dts.
[dts @ 0x3508240] Format dts probed with size=16384 and score=51
[dts @ 0x3508240] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0
[dca @ 0x3508c60] Stream with high frequencies VQ coding
[dts @ 0x3508240] All info found
[dts @ 0x3508240] Estimating duration from bitrate, this may be inaccurate
[dts @ 0x3508240] After avformat_find_stream_info() pos: 101376 bytes read:131072 seeks:0 frames:50
Input #0, dts, from 'inter.dts':
  Duration: 00:00:29.47, start: 0.000000, bitrate: 1535 kb/s
    Stream #0:0, 50, 1/90000: Audio: dts (DTS), 48000 Hz, 5.1(side), fltp, 1536 kb/s
Successfully opened the file.
Parsing a group of options: output file out.wav.
Applying option acodec (force audio codec ('copy' to copy stream)) with argument pcm_f32le.
Applying option ac (set number of audio channels) with argument 2.
Successfully parsed a group of options.
Opening an output file: out.wav.
Successfully opened the file.
detected 8 logical cores
[graph 0 input from stream 0:0 @ 0x34fd740] Setting 'time_base' to value '1/48000'
[graph 0 input from stream 0:0 @ 0x34fd740] Setting 'sample_rate' to value '48000'
[graph 0 input from stream 0:0 @ 0x34fd740] Setting 'sample_fmt' to value 'fltp'
[graph 0 input from stream 0:0 @ 0x34fd740] Setting 'channel_layout' to value '0x60f'
[graph 0 input from stream 0:0 @ 0x34fd740] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x60f
[audio format for output stream 0:0 @ 0x34fdf00] Setting 'sample_fmts' to value 'flt'
[audio format for output stream 0:0 @ 0x34fdf00] Setting 'channel_layouts' to value '0x3'
[audio format for output stream 0:0 @ 0x34fdf00] auto-inserting filter 'auto-inserted resampler 0' between the filter 'Parsed_anull_0' and the filter 'audio format for output stream 0:0'
[AVFilterGraph @ 0x3509720] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
1.000000 0.000000 0.707107 0.000000 0.707107 0.000000
0.000000 1.000000 0.707107 0.000000 0.000000 0.707107
[auto-inserted resampler 0 @ 0x34f4780] ch:6 chl:5.1(side) fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:flt r:48000Hz
Output #0, wav, to 'out.wav':
  Metadata:
    ISFT            : Lavf56.33.101
    Stream #0:0, 0, 1/48000: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
    Metadata:
      encoder         : Lavc56.39.101 pcm_f32le
Stream mapping:
  Stream #0:0 -> #0:0 (dts (dca) -> pcm_f32le (native))
Press [q] to stop, [?] for help
[dca @ 0x35091a0] Stream with high frequencies VQ coding
[output stream 0:0 @ 0x350c180] EOF on sink link output stream 0:0:default.
No more output streams to write to, finishing.
size=   11248kB time=00:00:29.99 bitrate=3072.0kbits/s
video:0kB audio:11248kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000990%
Input file #0 (inter.dts):
  Input stream #0:0 (audio): 2812 packets read (5657744 bytes); 2812 frames decoded (1439744 samples);
  Total: 2812 packets (5657744 bytes) demuxed
Output file #0 (out.wav):
  Output stream #0:0 (audio): 2812 frames encoded (1439744 samples); 2812 packets muxed (11517952 bytes);
  Total: 2812 packets (11517952 bytes) muxed
2812 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x350c080] Statistics: 6 seeks, 2816 writeouts
[AVIOContext @ 0x3507800] Statistics: 5657744 bytes read, 0 seeks

Change History (5)

comment:2 Changed 4 years ago by heleppkes

You need to set an option "rematrix_maxval=1.0", otherwise the downmixing can exceed 100% and will clip. Its automatically set if you use integer formats, so thats why its not happening there.

This should really be a default, IMHO. Its very unlikely that someone wants audio that exceeds 100%, and if they do, they can reset this value back to 0.

comment:3 Changed 4 years ago by cehoyos

  • Cc heleppkes added

Setting rematrix_maxval to 1.0 apparently does not help:
http://thread.gmane.org/gmane.comp.video.ffmpeg.user/56969/focus=57016

comment:4 Changed 4 years ago by heleppkes

I tested with the provided sample, and rematrix_maxval=1.0 fixes the problem. The output gets quiter, and the artifacts vanish.

Its also visible in the debug output, as the mixing matrix changes from:
1.000000 0.000000 0.707107 0.000000 0.707107 0.000000
0.000000 1.000000 0.707107 0.000000 0.000000 0.707107

to:
0.414214 0.000000 0.292893 0.000000 0.292893 0.000000
0.000000 0.414214 0.292893 0.000000 0.000000 0.292893

comment:5 Changed 3 years ago by cehoyos

  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.