Opened 5 years ago
Last modified 2 months ago
#8505 open defect
fast aac encoder produces invalid output for x86 32-bit builds
Reported by: | LeadAssimilator | Owned by: | |
---|---|---|---|
Priority: | important | Component: | avcodec |
Version: | git-master | Keywords: | regression aac |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary:
The aac encoder can produce invalid output when using a crosscompiled windows build with certain input audio data. The following errors have been observed:
Number of bands (56) exceeds limit (44). invalid band type
Reproduction:
% ffmpeg.exe -i input.wav -b:a 192K output.aac % ffmpeg.exe -i output.aac -f null -v error -
Attachments (2)
Change History (28)
by , 5 years ago
Attachment: | ffmpeg-20200131-060709.log added |
---|
comment:3 by , 5 years ago
Replying to mkver:
This ticket is missing a sample file to reproduce the bug.
Sample was recently uploaded as trac_8505_0_input.wav via the VideoLAN File Uploader.
comment:4 by , 5 years ago
This issue appears to affect only mingw cross-compiled ffmpeg builds such as those from https://ffmpeg.zeranoe.com/builds/ or made with https://github.com/rdp/ffmpeg-windows-build-helpers on Ubuntu. Native builds on nix with gcc or msvc on win do not appear to have this problem.
I suspect a compiler issue with mingw and/or some related math problem.
follow-up: 6 comment:5 by , 5 years ago
Priority: | critical → normal |
---|---|
Version: | 4.2 → unspecified |
Is the issue you see reproducible with current FFmpeg git head, the only version supported here?
comment:6 by , 5 years ago
Replying to cehoyos:
Is the issue you see reproducible with current FFmpeg git head, the only version supported here?
I did some quick tests with the zeranoe builds and the last working version was 3.4.2.
All 4.x versions, including master/head are broken.
comment:7 by , 5 years ago
(Since your answer is difficult to parse)
Please provide the command line you tested together with the complete, uncut console output to make this a valid ticket.
comment:8 by , 5 years ago
What in my answer is difficult to parse exactly?
3.4.2 success
4.0.0 fail
4.0.1 fail
4.0.2 fail
4.1.0 fail
4.1.1 fail
4.1.3 fail
4.2.0 fail
4.2.1 fail
4.2.2 fail
master/head fail
The commandlines are already in both the description and the report files. And the output is there as well, albeit intermixed with the stuff that -report adds. I'm not sure why you encourage including them if you are just going to ignore them...
The output w/o -report is this for the one sample I uploaded (I don't yet have an input sample to upload for the "invalid band type" error yet):
ffmpeg.exe -i trac_8505_0_input.wav -b:a 192K output.aac ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.2.1 (GCC) 20200122 configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, wav, from 'trac_8505_0_input.wav': Duration: 00:00:42.43, bitrate: 2822 kb/s Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 44100 Hz, stereo, flt, 2822 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_f32le (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, adts, to 'output.aac': Metadata: encoder : Lavf58.29.100 Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 192 kb/s Metadata: encoder : Lavc58.54.100 aac size= 512kB time=00:00:35.92 bitrate= 116.8kbits/s speed=71.8x size= 870kB time=00:00:42.44 bitrate= 167.9kbits/s speed=67.9x video:0kB audio:858kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.457950% [aac @ 01664d80] Qavg: 10880.012
ffmpeg.exe -i output.aac -f null -v error - [aac @ 02f51bc0] Number of bands (56) exceeds limit (44). Error while decoding stream #0:0: Invalid data found when processing input
And this would be a hell of a lot easier and less confusing if we could have the edit description grant for our own trac tickets...especially since I have to upload samples separately causing the names to mismatch what I originally specified.
follow-up: 11 comment:9 by , 5 years ago
Priority: | normal → important |
---|
Per the manual, changed priority to important given this is a regression.
Of interesting note, there are considerable differences in encoder output between gcc nix compiles vs. crosscompiles for win of ffmpeg. The output matches for the first few kilobytes then diverges. However, msvc win compiles produce the same encoder output as gcc nix compiles, yet their decode output differs.
comment:10 by , 5 years ago
Keywords: | regression added |
---|
comment:11 by , 5 years ago
Replying to LeadAssimilator:
Per the manual, changed priority to important given this is a regression.
Which change introduced the regression (this is a necessary information)?
follow-up: 13 comment:12 by , 5 years ago
I'm not sure why that would be necessary - not only is that not indicated in the manual, that isn't even true for many of the tickets already reported and marked as a regression, which is the pattern I'm following.
In practice, stating 3.4.2 is working and 4.0.0 and later are not, is more than sufficient for regression demarcation. If the exact commit was known, then I would have fixed the issue already not wasted my time gathering information, filing a bug and getting frustrated by this whole experience.
comment:13 by , 5 years ago
Replying to LeadAssimilator:
I'm not sure why that would be necessary - not only is that not indicated in the manual, that isn't even true for many of the tickets already reported and marked as a regression, which is the pattern I'm following.
Which ticket marked as regression and important did you find that does not contain the minimal requested information - missing from this ticket - and the responsible commit?
In practice, stating 3.4.2 is working and 4.0.0 and later are not, is more than sufficient for regression demarcation. If the exact commit was known, then I would have fixed the issue already not wasted my time gathering information, filing a bug and getting frustrated by this whole experience.
You will be happy to know that the feeling is mutual.
follow-up: 15 comment:14 by , 5 years ago
I don't remember, but they can be easily found if you look. If there is some policy regarding priorities and regression tags, then I suggest pointing me to it and enforcing it better.
No, that makes me sad, but I'm trying to remain patient in light of the apparent language barrier through critical commentary and helpful suggestions.
Any assistance in helping me keep my patience while I narrow down the faulty commit(s) would be much appreciated :)
comment:15 by , 5 years ago
Replying to LeadAssimilator:
I don't remember, but they can be easily found if you look.
I did look.
comment:16 by , 5 years ago
Summary: | aac encoder produces invalid output → fast aac encoder produces invalid output |
---|
The regression in 4.x/master is related to this change fcb681ac3edd708f26c3b5014c06e32fd2723887 where the default coder was switched from twoloop to fast.
So it appears the fast coder is broken for these crosscompiled windows builds. Now to find out why...
comment:17 by , 5 years ago
Summary: | fast aac encoder produces invalid output → fast aac encoder produces invalid output for x86 32-bit builds |
---|
Turns out it isn't a crosscompile windows issue, but rather a 32-bit compilation issue with gcc. I've reproduced the issue with master/head on Ubuntu 19.10 x64 with gcc 9.2.1-9ubuntu2 by compiling 32-bit ffmpeg using the following configure:
./configure --disable-static --enable-shared --enable-gpl --enable-version3 --extra-cflags=-m32 --extra-cxxflags=-m32 --extra-ldflags=-m32
So far it seems that any 32-bit x86 gcc based builds of ffmpeg have this problem, but x64 builds are ok and msvc 32-bit builds are ok.
comment:18 by , 5 years ago
There appears to be an issue in quantize_and_encode_band_cost_template where it attempts to encode an ESC sequence with too few bits. A minimum coefficient value of 16 is required for an ESC sequence, but the computed coefficient is only 15. Since the value 15 is too low, the required number of bits for the combined ESC sequence prefix bits and separator bit is also calculated too low at 0 (minimum is 1), along with an invalid value of -1. These erroneous values are then used in the call to put_bits which writes 0 bits to the output instead of the minimum required 1 bit. Similarly the number of bits for the sequence word is also under calculated at 3 bits, versus the minimum 4 bits. They again are used in the call to put_sbits with the too small coefficient 15, that writes 3 bits to the output instead of the minimum required 4 bits. As a result, the entire ESC sequence is only coded as 3 bits as opposed to the the shortest viable sequence of 5 bits and this corrupts the output bitstream.
It seems that ff_aac_quantize_bands_sse2 may be partially to blame. When used, it quantizes slightly differently than the non-sse2 version quantize_bands, producing larger values. This might not normally be an issue, but later calculations are done with the same inputs using a different function, quant, which yields different results during the writing of the ESC sequence. In this particular failure case, the input -56421.839844 (-0x1.b8cbaep+15) quantized to 16 via sse2, 15 via the non-sse2 variant quantize_bands (during a working run) and 15 via quant. This discrepancy explains why this issue affects only 32-bit x86 builds, however it can happen any time ff_aac_quantize_bands_sse2|quantize_bands produces values larger than quant for the same input and the resulting quant output is < 16.
I see the following possible solutions:
1) Use the same exact quantization routines for all involved calculations (possible performance impacts)
2) Clamp the lower bound of the escape sequence coefficient to 16
It would be great if someone more familiar with the aac encoder or these kinds of issues weighs in and/or takes over. If no one is interested, I can take a stab at producing a patch.
comment:19 by , 5 years ago
Reproduced by developer: | set |
---|---|
Status: | new → open |
Version: | unspecified → git-master |
Regression since d2ae5f77c61a29c3c63cc4c41c74ccfca4167649, not reproducible on Windows.
comment:20 by , 5 years ago
I'm not exactly sure how to fix that. Clamping the escape sequence length is more of a hack, and getting rid of the SSE function would slow down the encoder by a non-negligible amount.
What I don't get is how the results of the 2 quant functions diverge. Nothing should use x87 code now, even on 32 bit machines since its just unacceptably slow, so compilers just use SSE but don't use more than 1 element of a vector. I don't mind making the C quant version match the SSE version, though I don't really know how.
comment:21 by , 5 years ago
gcc only uses sse/sse2 if it is explicitly told to via -msse or -msse2 and -mfpmath=sse. By default, ffmpeg will not compile with these flags unless they are set via --extra-cflags and -extra-cxxflags.
Introducing an sse2 optimized version of quant would help in the common case, but I also spotted that compiling with -O2 causes the C version of quantize_bands to output 16 for the input in question, but quant still outputs 15. So that means compiling with -O2 and running on a system without sse2 will also have this same problem. While rare these days, it is still technically wrong and a new sse2 optimized version of quant wouldn't help there.
While clamping is sort of a hack, it might be the best option to prevent corrupted output in all cases, even for other cpu architectures where similar bugs may exist. It could still be combined with an sse2 optimized version of quant though to improve correctness in the common case.
comment:22 by , 5 years ago
Would an SSE quant function fix the problem? It would have to be done in inline asm since the function overhead doesn't seem great.
comment:23 by , 5 years ago
Do you mean use sse instead of sse2? quant doesn't need any specific sse2 instructions and can be implemented in sse only. But ff_aac_quantize_bands_sse2 is used when sse2 is available, so a sse version of quant would also have to only be used when sse2 is available.
Using a sse quant when sse2 is available would fix the problem when run on x86 with sse2, but not when run on x86 without sse2 (old cpus or embedded socs like intel quark for example) and compiling with -Os, -O1 or better. The sse quant would also have to be called via a function pointer like other hand optimized functions. The performance impact should be negligible, but isn't correctness to prevent corrupted output more important here?
comment:25 by , 6 months ago
No longer reproducible (with the provided sample and the fast coder explicitly selected) since commit 8f3e06231450a2edaf933aed57a370e3bb819629. Yet I do not think that the actual bug has been fixed, given LeadAssimilator's analysis.
comment:26 by , 2 months ago
I'm investigating a reproduction with:
ffmpeg version N-66244-g468615f204-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libfribidi --enable-libass --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg
libavutil 58. 13.101 / 58. 13.101
libavcodec 60. 21.100 / 60. 21.100
libavformat 60. 9.100 / 60. 9.100
libavdevice 60. 2.100 / 60. 2.100
libavfilter 9. 8.102 / 9. 8.102
libswscale 7. 3.100 / 7. 3.100
libswresample 4. 11.100 / 4. 11.100
libpostproc 57. 2.100 / 57. 2.100
Hyper fast Audio and Video encoder
(is N-66244-g468615f204-static this before or after the mentioned commit?)
On a specific audio input, subsequent decode reliably produces :
[aac @ 0xcba1d80] Number of bands (43) exceeds limit (41).ed=98.4x
[aist#0:1/aac @ 0xcbcb800] Error submitting packet to decoder: Invalid data found when processing input
Encode command:
./ffmpeg -i ./media/aacorg.mp4 -y -dn -vn ./media/aaconly.mp4
Decode command:
./ffmpeg -i ./media/aaconly.mp4 -f wav -vn -y /dev/null
This version seems to not reproduce the issue:
ffmpeg version 6.1-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
Hyper fast Audio and Video encoder
I am also unable to reproduce with 7.0.2:
ffmpeg version 7.0.2-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
Universal media converter
However, if I use -ss to start the encoding from 1 hour into the audio (the issue occurs at ~01:47:14), the issues does not occur. If I transcode to mp3, and then encode, the issue does not occur. I can only reproduce from the original aac input (with or without video in the source mp4).
So the issue is very hard to reproduce, so it's possible the mods MAY have just changed the parameters enough to hide it for now.
unfortunately I don't have permission to share the complete source audio, and trying to trim it prevents the issue from occurring.
encode report