#10210 closed defect (fixed)

On Windows with x86 build, calling avcodec_send_packet does not free FPU registers

Reported by: Kevin Coulombe Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: avcodec FPU fld ST0 x87
Cc: Kevin Coulombe, Seb Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
When using a x86 build on Windows (64 bits), calling some functions from avcodec sometimes does not free the ST0 through 7 registers.

On dotnet (as is our case), these registers are sometimes used for floating point to integer connversions. This causes the fld instruction to fail and some very weird behaviors.

How to reproduce:
I've built the simplest repro I could in pure C++ here : https://github.com/stonkie/FfmpegRegisterExample

It includes the prebuilt assemblies, input jpeg image.

The included dlls use an in-house built version of avcodec (included in the demo repo) based on this February commit 7268323193d55365f914de39fadd5dbdb1f68976

We build it using the following configure script using the msvc toolchain :

./configure --toolchain=msvc --target-os=win32 --arch=x86 --enable-shared --disable-static --prefix=./build32 --enable-libdav1d --extra-ldflags="/SOURCELINK:sourcelink.json" --disable-mediafoundation

Change History (14)

comment:1 by Balling, 15 months ago

How is that not CPU/ Dotnet fault?

comment:2 by Kevin Coulombe, 15 months ago

I reproduce this with both dotnet and C++. The sample repo is in pure C++ : https://github.com/stonkie/FfmpegRegisterExample

I found some references saying those registers should be cleared when exiting a function with cdecl.

https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl

The post that lead me in this direction is here : https://randomascii.wordpress.com/2022/11/21/please-restore-our-registers-when-youre-done-with-them/

It's not exactly the same registers, but it's the same problem.

comment:3 by Balling, 15 months ago

Calling back into avcodec (avcodec_receive_frame for example) doesn't seem to have adverse effects. It even seems to clear the registers sometimes, although I have not identified the pattern for it yet.

What did you use to compile .dlls? They are 32 bit, are you sure we even care about 32 bit?

input jpeg image

It is corrupt...

It's not exactly the same registers, but it's the same problem.

Is that inline assembly bug? Why do not you think is not a bug of what you used to compile .dlls? It appears you used 14.29, which is still Visual Studio 2019.

comment:4 by Seb, 14 months ago

I can reproduce this issue after recompiling FFmpeg 5.1.2 in x86 mode with Visual Studio 2022.

The call to send_packet_proc() incorrectly keeps the FPU registers in the used state if decoding the input JPEG returns an error. After that, any instruction using the FPU fails (For example '4.22/2.11' returns 'NaN').

This happens because ff_mjpeg_decode_sos() doesn't call emms_c() to make the FPU available after mjpeg_decode_scan() returns an error. emms_c() is called only on success.

comment:5 by Seb, 14 months ago

I cannot reproduce the problem with FFmpeg HEAD because the mjpeg decoder has been changed from FF_CODEC_CB_TYPE_RECEIVE_FRAME to FF_CODEC_DECODE_CB.

The decoder code now goes through decode_simple_internal() which calls emms_c() to make the FPU available.

comment:6 by Balling, 14 months ago

That means 60 major version is not affected, we only support git-master here. There were a lot of handle leaks fixed at that moment (the actual fix was MJPEG (and the MJPEG-based decoders)
were switched to the receive_frame API in commit
e9a2a8777317d91af658f774c68442ac4aa726ec), see e.g. https://github.com/IENT/YUView/issues/483#issuecomment-1427110406

Also, I am still not convinced emms_c should not be called by Windows or Linux kernel on exit (or y apps on start). After all most of handles close and so does the heap memory. Though I think handles on files are not autoclosed, there was such a bug inside Chrome...

Last edited 14 months ago by Balling (previous) (diff)

in reply to:  6 comment:7 by Kevin Coulombe, 14 months ago

Replying to Balling:

Also, I am still not convinced emms_c should not be called by Windows or Linux kernel on exit (or y apps on start). After all most of handles close and so does the heap memory. Though I think handles on files are not autoclosed, there was such a bug inside Chrome...

I'm not sure I understand this.

The demo app calls into avcodec directly through cross dll function calls in cdelc calling convention. Then when it recovers control, the registers are used up and there isn't much Windows or Linux can do to clear those registers.

The compiler of the hosting application could add the instructions to do it, but cdecl specifically says it's the callee's responsibility. So both the VC++ compiler and the dotnet runtime assume that those registers have been freed by the callee and don't touch them after a function call.

Is that inline assembly bug? Why do not you think is not a bug of what you used to compile .dlls? It appears you used 14.29, which is still Visual Studio 2019.

I built it with VS 2019 because the build scripts I got break down with VS 2022 now being x64 (different program files dir). But Seb seems to have the same issue with VS2022. I'd be very surprised if we were in front of a Microsoft compiler bug.

It is corrupt...

As for the image being corrupt, the decoder should still respect calling conventions. We receive the payload from physical cameras that can get pretty creative with their encoding...

comment:8 by Balling, 14 months ago

But Seb seems to have the same issue with VS2022

Yes, but issue is gone in major version 60. We only support master branch and HEAD commit of master branch, no HEAD~1 even, okay? There is no way for this to be fixed in older branches, because this was deprecation and API break.

scripts I got break down with VS 2022 now being x64 (different program files dir).

Do you have the same issue: #10233

I'd be very surprised if we were in front of a Microsoft compiler bug.

There are a lot of bugs in MSVC, just recently: https://developercommunity.visualstudio.com/VisualStudio?q=codegen

https://www.reddit.com/r/cpp/search/?q=developercommunity.visualstudio.com&restrict_sr=1&sr_nsfw=

comment:9 by Seb, 14 months ago

Ffmpeg releases the FPU registers after calling codec->cb.decode() (in decode_simple_internal()).

It would seem logical to do the same after calling codec->cb.receive_frame() since there is no guarantee that the codec's implementation will do it. Something along the lines of:

--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -547,6 +547,7 @@ static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)

     if (codec->cb_type == FF_CODEC_CB_TYPE_RECEIVE_FRAME) {
         ret = codec->cb.receive_frame(avctx, frame);
+        emms_c();
     } else
         ret = decode_simple_receive_frame(avctx, frame);

While I cannot reproduce the issue with the JPEG image since the MJPEG decoder is now based on decode, 10 codecs are still implementing receive_frame.

comment:10 by Seb, 14 months ago

Cc: Seb added

comment:11 by Balling, 14 months ago

While I cannot reproduce the issue with the JPEG image since the MJPEG decoder is now based on decode, 10 codecs are still implementing receive_frame.

If it is just 10 codecs, why not put emms_c() in each? How hard can it be?

comment:14 by mkver, 14 months ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.