Opened 4 years ago

Last modified 3 years ago

#8505 open defect

fast aac encoder produces invalid output for x86 32-bit builds

Reported by: LeadAssimilator Owned by:
Priority: important Component: avcodec
Version: git-master Keywords: regression aac
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary:
The aac encoder can produce invalid output when using a crosscompiled windows build with certain input audio data. The following errors have been observed:

Number of bands (56) exceeds limit (44).
invalid band type

Reproduction:

% ffmpeg.exe -i input.wav -b:a 192K output.aac
% ffmpeg.exe -i output.aac -f null -v error -

Attachments (2)

ffmpeg-20200131-060709.log (6.0 KB ) - added by LeadAssimilator 4 years ago.
encode report
ffmpeg-20200131-060735.log (5.2 KB ) - added by LeadAssimilator 4 years ago.
verify report

Download all attachments as: .zip

Change History (26)

by LeadAssimilator, 4 years ago

Attachment: ffmpeg-20200131-060709.log added

encode report

by LeadAssimilator, 4 years ago

Attachment: ffmpeg-20200131-060735.log added

verify report

comment:1 by mkver, 4 years ago

This ticket is missing a sample file to reproduce the bug.

comment:2 by LeadAssimilator, 4 years ago

Sample just completed upload.

in reply to:  1 comment:3 by LeadAssimilator, 4 years ago

Replying to mkver:

This ticket is missing a sample file to reproduce the bug.

Sample was recently uploaded as trac_8505_0_input.wav via the VideoLAN File Uploader.

comment:4 by LeadAssimilator, 4 years ago

This issue appears to affect only mingw cross-compiled ffmpeg builds such as those from https://ffmpeg.zeranoe.com/builds/ or made with https://github.com/rdp/ffmpeg-windows-build-helpers on Ubuntu. Native builds on nix with gcc or msvc on win do not appear to have this problem.

I suspect a compiler issue with mingw and/or some related math problem.

comment:5 by Carl Eugen Hoyos, 4 years ago

Priority: criticalnormal
Version: 4.2unspecified

Is the issue you see reproducible with current FFmpeg git head, the only version supported here?

in reply to:  5 comment:6 by LeadAssimilator, 4 years ago

Replying to cehoyos:

Is the issue you see reproducible with current FFmpeg git head, the only version supported here?

I did some quick tests with the zeranoe builds and the last working version was 3.4.2.

All 4.x versions, including master/head are broken.

comment:7 by Carl Eugen Hoyos, 4 years ago

(Since your answer is difficult to parse)
Please provide the command line you tested together with the complete, uncut console output to make this a valid ticket.

comment:8 by LeadAssimilator, 4 years ago

What in my answer is difficult to parse exactly?

3.4.2 success
4.0.0 fail
4.0.1 fail
4.0.2 fail
4.1.0 fail
4.1.1 fail
4.1.3 fail
4.2.0 fail
4.2.1 fail
4.2.2 fail
master/head fail

The commandlines are already in both the description and the report files. And the output is there as well, albeit intermixed with the stuff that -report adds. I'm not sure why you encourage including them if you are just going to ignore them...

The output w/o -report is this for the one sample I uploaded (I don't yet have an input sample to upload for the "invalid band type" error yet):

ffmpeg.exe -i trac_8505_0_input.wav -b:a 192K output.aac
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20200122
  configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'trac_8505_0_input.wav':
  Duration: 00:00:42.43, bitrate: 2822 kb/s
    Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 44100 Hz, stereo, flt, 2822 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_f32le (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, adts, to 'output.aac':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 192 kb/s
    Metadata:
      encoder         : Lavc58.54.100 aac
size=     512kB time=00:00:35.92 bitrate= 116.8kbits/s speed=71.8x    
size=     870kB time=00:00:42.44 bitrate= 167.9kbits/s speed=67.9x    
video:0kB audio:858kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.457950%
[aac @ 01664d80] Qavg: 10880.012
ffmpeg.exe -i output.aac -f null -v error -
[aac @ 02f51bc0] Number of bands (56) exceeds limit (44).
Error while decoding stream #0:0: Invalid data found when processing input

And this would be a hell of a lot easier and less confusing if we could have the edit description grant for our own trac tickets...especially since I have to upload samples separately causing the names to mismatch what I originally specified.

comment:9 by LeadAssimilator, 4 years ago

Priority: normalimportant

Per the manual, changed priority to important given this is a regression.

Of interesting note, there are considerable differences in encoder output between gcc nix compiles vs. crosscompiles for win of ffmpeg. The output matches for the first few kilobytes then diverges. However, msvc win compiles produce the same encoder output as gcc nix compiles, yet their decode output differs.

comment:10 by LeadAssimilator, 4 years ago

Keywords: regression added

in reply to:  9 comment:11 by Carl Eugen Hoyos, 4 years ago

Replying to LeadAssimilator:

Per the manual, changed priority to important given this is a regression.

Which change introduced the regression (this is a necessary information)?

comment:12 by LeadAssimilator, 4 years ago

I'm not sure why that would be necessary - not only is that not indicated in the manual, that isn't even true for many of the tickets already reported and marked as a regression, which is the pattern I'm following.

In practice, stating 3.4.2 is working and 4.0.0 and later are not, is more than sufficient for regression demarcation. If the exact commit was known, then I would have fixed the issue already not wasted my time gathering information, filing a bug and getting frustrated by this whole experience.

in reply to:  12 comment:13 by Carl Eugen Hoyos, 4 years ago

Replying to LeadAssimilator:

I'm not sure why that would be necessary - not only is that not indicated in the manual, that isn't even true for many of the tickets already reported and marked as a regression, which is the pattern I'm following.

Which ticket marked as regression and important did you find that does not contain the minimal requested information - missing from this ticket - and the responsible commit?

In practice, stating 3.4.2 is working and 4.0.0 and later are not, is more than sufficient for regression demarcation. If the exact commit was known, then I would have fixed the issue already not wasted my time gathering information, filing a bug and getting frustrated by this whole experience.

You will be happy to know that the feeling is mutual.

comment:14 by LeadAssimilator, 4 years ago

I don't remember, but they can be easily found if you look. If there is some policy regarding priorities and regression tags, then I suggest pointing me to it and enforcing it better.

No, that makes me sad, but I'm trying to remain patient in light of the apparent language barrier through critical commentary and helpful suggestions.

Any assistance in helping me keep my patience while I narrow down the faulty commit(s) would be much appreciated :)

in reply to:  14 comment:15 by Carl Eugen Hoyos, 4 years ago

Replying to LeadAssimilator:

I don't remember, but they can be easily found if you look.

I did look.

comment:16 by LeadAssimilator, 4 years ago

Summary: aac encoder produces invalid outputfast aac encoder produces invalid output

The regression in 4.x/master is related to this change fcb681ac3edd708f26c3b5014c06e32fd2723887 where the default coder was switched from twoloop to fast.

So it appears the fast coder is broken for these crosscompiled windows builds. Now to find out why...

comment:17 by LeadAssimilator, 4 years ago

Summary: fast aac encoder produces invalid outputfast aac encoder produces invalid output for x86 32-bit builds

Turns out it isn't a crosscompile windows issue, but rather a 32-bit compilation issue with gcc. I've reproduced the issue with master/head on Ubuntu 19.10 x64 with gcc 9.2.1-9ubuntu2 by compiling 32-bit ffmpeg using the following configure:
./configure --disable-static --enable-shared --enable-gpl --enable-version3 --extra-cflags=-m32 --extra-cxxflags=-m32 --extra-ldflags=-m32

So far it seems that any 32-bit x86 gcc based builds of ffmpeg have this problem, but x64 builds are ok and msvc 32-bit builds are ok.

comment:18 by LeadAssimilator, 4 years ago

There appears to be an issue in quantize_and_encode_band_cost_template where it attempts to encode an ESC sequence with too few bits. A minimum coefficient value of 16 is required for an ESC sequence, but the computed coefficient is only 15. Since the value 15 is too low, the required number of bits for the combined ESC sequence prefix bits and separator bit is also calculated too low at 0 (minimum is 1), along with an invalid value of -1. These erroneous values are then used in the call to put_bits which writes 0 bits to the output instead of the minimum required 1 bit. Similarly the number of bits for the sequence word is also under calculated at 3 bits, versus the minimum 4 bits. They again are used in the call to put_sbits with the too small coefficient 15, that writes 3 bits to the output instead of the minimum required 4 bits. As a result, the entire ESC sequence is only coded as 3 bits as opposed to the the shortest viable sequence of 5 bits and this corrupts the output bitstream.

It seems that ff_aac_quantize_bands_sse2 may be partially to blame. When used, it quantizes slightly differently than the non-sse2 version quantize_bands, producing larger values. This might not normally be an issue, but later calculations are done with the same inputs using a different function, quant, which yields different results during the writing of the ESC sequence. In this particular failure case, the input -56421.839844 (-0x1.b8cbaep+15) quantized to 16 via sse2, 15 via the non-sse2 variant quantize_bands (during a working run) and 15 via quant. This discrepancy explains why this issue affects only 32-bit x86 builds, however it can happen any time ff_aac_quantize_bands_sse2|quantize_bands produces values larger than quant for the same input and the resulting quant output is < 16.

I see the following possible solutions:
1) Use the same exact quantization routines for all involved calculations (possible performance impacts)
2) Clamp the lower bound of the escape sequence coefficient to 16

It would be great if someone more familiar with the aac encoder or these kinds of issues weighs in and/or takes over. If no one is interested, I can take a stab at producing a patch.

comment:19 by Carl Eugen Hoyos, 4 years ago

Reproduced by developer: set
Status: newopen
Version: unspecifiedgit-master

Regression since d2ae5f77c61a29c3c63cc4c41c74ccfca4167649, not reproducible on Windows.

comment:20 by Lynne, 4 years ago

I'm not exactly sure how to fix that. Clamping the escape sequence length is more of a hack, and getting rid of the SSE function would slow down the encoder by a non-negligible amount.
What I don't get is how the results of the 2 quant functions diverge. Nothing should use x87 code now, even on 32 bit machines since its just unacceptably slow, so compilers just use SSE but don't use more than 1 element of a vector. I don't mind making the C quant version match the SSE version, though I don't really know how.

comment:21 by LeadAssimilator, 4 years ago

gcc only uses sse/sse2 if it is explicitly told to via -msse or -msse2 and -mfpmath=sse. By default, ffmpeg will not compile with these flags unless they are set via --extra-cflags and -extra-cxxflags.

Introducing an sse2 optimized version of quant would help in the common case, but I also spotted that compiling with -O2 causes the C version of quantize_bands to output 16 for the input in question, but quant still outputs 15. So that means compiling with -O2 and running on a system without sse2 will also have this same problem. While rare these days, it is still technically wrong and a new sse2 optimized version of quant wouldn't help there.

While clamping is sort of a hack, it might be the best option to prevent corrupted output in all cases, even for other cpu architectures where similar bugs may exist. It could still be combined with an sse2 optimized version of quant though to improve correctness in the common case.

comment:22 by Lynne, 4 years ago

Would an SSE quant function fix the problem? It would have to be done in inline asm since the function overhead doesn't seem great.

comment:23 by LeadAssimilator, 4 years ago

Do you mean use sse instead of sse2? quant doesn't need any specific sse2 instructions and can be implemented in sse only. But ff_aac_quantize_bands_sse2 is used when sse2 is available, so a sse version of quant would also have to only be used when sse2 is available.

Using a sse quant when sse2 is available would fix the problem when run on x86 with sse2, but not when run on x86 without sse2 (old cpus or embedded socs like intel quark for example) and compiling with -Os, -O1 or better. The sse quant would also have to be called via a function pointer like other hand optimized functions. The performance impact should be negligible, but isn't correctness to prevent corrupted output more important here?

comment:24 by Balling, 3 years ago

The default coder was switched from fast to twoloop. So bug is gone.

Last edited 3 years ago by Balling (previous) (diff)
Note: See TracTickets for help on using tickets.