Opened 7 years ago

Closed 3 years ago

#6722 closed defect (fixed)

XMA/WMAPro decoder: gapless problems

Reported by: bnnm Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: xma, wmapro
Cc: jl@conductive.de Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

XMA first/last output is slightly incorrect, notably breaking gapless files. Haven't tested WMAPro files as much but I believe also applies.


From tests with Microsoft's XMA encoder (xmaencode.exe):

  • FFmpeg decodes 128(?) samples late, and xmaencode adds 128 samples at output start (1 subframe of "setup_samples"?), making last subframe in a file incorrect. Basically the first and last output is off (samples in the middle look fine).
  • FFmpeg ignores "start skip" (samples to discard at the beginning) and "end skip" (same at the end), see wmaprodec.c@1443. Both used in decodes and applied after the first 128 "extra" samples. start_skip is usually set to frame_samples and end_skip to <frame_samples, but if manually changed (ex. to 100 or 0) xmaencode will honor the values. If >frame_size it clamps the value (512 in XMA), and end_skip seems to be always included even if 0.

ex. final samples output of 10 frames: FFmpeg = 10*512; xmaencode = 128 + 1*512 - start_skip + 9*512 - end_skip


Example files: https://mega.nz/#!DBAFGY4C!Jb0Y8gtDpm_V12DSqz5LP63k7xkqq_L9fMNn0Fc0Qv4

test_20.xma (20 PCM samples)

  • xmaencode encodes 1 frame (512) file
  • xmaencode decodes 128 + 512 - 512 start_skip (leaving last 128 from the frame) - 108 end_skip = 20 samples
  • FFmpeg just outputs 512, and the waveform isn't correct (last 128 are wrong)

Screenshot: original PCM vs xmaencode vs ffmpeg vs xmaencode manually removing the skips from the file.

test_322.xma: same with 2 frames, see how FFmpeg starts "late", and the second frame now decodes like the first frame should in test_20.xma:

Both files are less than one packet, so it isn't a bug in the bit reservoir.
I encoded those but the issues are present in all "real" files AFAIK.


Console output:

% 
ffmpeg -i test_20.xma test_20.wav -v debug
ffmpeg version N-87353-g183fd30 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 7.2.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-cuda --enable-cuvid --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib
  libavutil      55. 76.100 / 55. 76.100
  libavcodec     57.106.101 / 57.106.101
  libavformat    57. 82.101 / 57. 82.101
  libavdevice    57.  8.101 / 57.  8.101
  libavfilter     6.105.100 /  6.105.100
  libswscale      4.  7.103 /  4.  7.103
  libswresample   2.  8.100 /  2.  8.100
  libpostproc    54.  6.100 / 54.  6.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'test_20.xma'.
Reading option 'test_20.wav' ... matched as output url.
Reading option '-v' ... matched as option 'v' (set logging level) with argument 'debug'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input url test_20.xma.
Successfully parsed a group of options.
Opening an input file: test_20.xma.
[NULL @ 00367800] Opening 'test_20.xma' for reading
[file @ 00367f20] Setting default whitelist 'file,crypto'
[wav @ 00367800] Format wav probed with size=2048 and score=99
[wav @ 00367800] Before avformat_find_stream_info() pos: 60 bytes read:2128 seeks:0 nb_streams:1
[wav @ 00367800] parser not found for codec xma1, packets or times may be invalid.
[xma1 @ 03231580] extradata:
[xma1 @ 03231580] [d6] [10] [0] [0] [1] [0] [0] [2] [80] [2d] [7] [0] [44] [ac] [0] [0] [0] [0] [0] [0] [0] [0] [0] [0] [0] [1] [1] [0] 
[wav @ 00367800] parser not found for codec xma1, packets or times may be invalid.
[wav @ 00367800] After avformat_find_stream_info() pos: 2128 bytes read:2128 seeks:0 frames:1
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'test_20.xma':
  Duration: N/A, bitrate: N/A
    Stream #0:0, 1, 1/44100: Audio: xma1 (e[1][0][0] / 0x0165), 44100 Hz, mono, fltp
Successfully opened the file.
Parsing a group of options: output url test_20.wav.
Successfully parsed a group of options.
Opening an output file: test_20.wav.
[file @ 0036df00] Setting default whitelist 'file,crypto'
Successfully opened the file.
[xma1 @ 03231c00] extradata:
[xma1 @ 03231c00] [d6] [10] [0] [0] [1] [0] [0] [2] [80] [2d] [7] [0] [44] [ac] [0] [0] [0] [0] [0] [0] [0] [0] [0] [0] [0] [1] [1] [0] 
Stream mapping:
  Stream #0:0 -> #0:0 (xma1 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
detected 2 logical cores
[graph_0_in_0_0 @ 032b5260] Setting 'time_base' to value '1/44100'
[graph_0_in_0_0 @ 032b5260] Setting 'sample_rate' to value '44100'
[graph_0_in_0_0 @ 032b5260] Setting 'sample_fmt' to value 'fltp'
[graph_0_in_0_0 @ 032b5260] Setting 'channels' to value '1'
[graph_0_in_0_0 @ 032b5260] tb:1/44100 samplefmt:fltp samplerate:44100 chlayout:(null)
[format_out_0_0 @ 032b56e0] Setting 'sample_fmts' to value 's16'
[format_out_0_0 @ 032b56e0] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 032b4a00] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 032b5e20] [SWR @ 032b6ea0] Using fltp internally between filters
[auto_resampler_0 @ 032b5e20] ch:1 chl:1 channels fmt:fltp r:44100Hz -> ch:1 chl:1 channels fmt:s16 r:44100Hz
Output #0, wav, to 'test_20.wav':
  Metadata:
    ISFT            : Lavf57.82.101
    Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels, s16, 705 kb/s
    Metadata:
      encoder         : Lavc57.106.101 pcm_s16le
[out_0_0 @ 032b55a0] EOF on sink link out_0_0:default.
No more output streams to write to, finishing.
size=       1kB time=00:00:00.01 bitrate= 759.3kbits/s speed= 5.8x    
video:0kB audio:1kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 7.617188%
Input file #0 (test_20.xma):
  Input stream #0:0 (audio): 1 packets read (2048 bytes); 1 frames decoded (512 samples); 
  Total: 1 packets (2048 bytes) demuxed
Output file #0 (test_20.wav):
  Output stream #0:0 (audio): 1 frames encoded (512 samples); 1 packets muxed (1024 bytes); 
  Total: 1 packets (1024 bytes) muxed
1 frames successfully decoded, 0 decoding errors
[AVIOContext @ 03290060] Statistics: 4 seeks, 4 writeouts
[AVIOContext @ 0036ca40] Statistics: 2128 bytes read, 0 seeks

Attachments (5)

test_20.xma (2.1 KB ) - added by bnnm 7 years ago.
test_20.png (20.2 KB ) - added by bnnm 7 years ago.
test_322.xma (2.1 KB ) - added by bnnm 7 years ago.
test_322.png (18.4 KB ) - added by bnnm 7 years ago.
xwma_6_48000_b48000.xwma (40.7 KB ) - added by bnnm 6 years ago.
example WMAPRO to demonstrate gapless

Download all attachments as: .zip

Change History (7)

by bnnm, 7 years ago

Attachment: test_20.xma added

by bnnm, 7 years ago

Attachment: test_20.png added

by bnnm, 7 years ago

Attachment: test_322.xma added

by bnnm, 7 years ago

Attachment: test_322.png added

by bnnm, 6 years ago

Attachment: xwma_6_48000_b48000.xwma added

example WMAPRO to demonstrate gapless

comment:1 by joellinn, 4 years ago

Cc: jl@conductive.de added

For me ffmpeg always is 64 samples late, using n3.0 and n4.3. This makes sense as it seems to be the mdct tdac ovelap size. Can you check again if you are indeed seeing 128 samples "delay"?

comment:2 by Elon Musk, 3 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.