Opened 11 months ago

Last modified 9 months ago

#10374 new defect

Problems to decode certain h264 keyframes with D3D11VA and DXVA2 hw acceleration

Reported by: Florian Grill Owned by:
Priority: normal Component: avcodec
Version: 6.0 Keywords: H264 dxva2 d3d11va
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description

Summary of the bug:
I'm currently developing a streaming app for Windows and for my project I use the javacv project to use the native ffmpeg libraries https://github.com/bytedeco/javacv. Developing the first prototype was quite easy but upon further testing on different machines with different hardware decoders I noticed some strange effects with the D3D11VA and DXVA2 video decoder. It was just not possible to decode certain keyframes but cuvid or the software decoder had no problem to do so. On these problematic keyframes I always received an error when calling

final int err = avcodec_send_packet(this.m_VideoDecoderCtx, this.pkt);

The logs printed out "Invalid data found when processing input" but I was 100% sure that the data was correct so I started to dig into source code and did some research on the internet and I found the issue. It seems that in h264dec.h there is a variable called MAX_SLICES which is set to 32.

https://github.com/FFmpeg/FFmpeg/blob/release/6.0/libavcodec/h264dec.h#L62

This MAX_SLICES value is used here dxva2_h264.c

https://github.com/FFmpeg/FFmpeg/blob/release/6.0/libavcodec/dxva2_h264.c#L478

which leads to the fact that frames with larger slices simply can't be processed. This is strange limitation if you ask me.

Other popular ffmpeg forks already increased this limit, e.g.

https://git.1f0.de/gitweb/?p=ffmpeg.git;a=commit;h=550cf548b546d386a6c634351ad0c250a3e47f3b;js=1

In my example I used

this.m_VideoDecoderCtx.err_recognition(this.m_VideoDecoderCtx.err_recognition() | AV_EF_EXPLODE);

to get a proper error output but for testing I removed it and the frame that was processed and displayed looked like that

https://github.com/grill2010/ExampleDXVA2/blob/main/example_data/test.jpg

This behaviour can be easily reproduced with the latest ffmpeg version. The example h264 keyframe can be downloaded from here

https://github.com/grill2010/ExampleDXVA2/blob/main/example_data/example.h264

How to reproduce:

% .\ffmpeg.exe -hwaccel dxva2 -i .\example.h264 test.jpg
or
% .\ffmpeg.exe -hwaccel d3d11va -i .\example.h264 test.jpg
ffmpeg version 6.0 and latest SNAPSHOT
built via https://www.gyan.dev/ffmpeg/builds/

On the ffmpeg version of javacv there was recently a patch for this small numbers of slices

https://github.com/bytedeco/javacpp-presets/commit/63acf680ef0d95cbdda1b3840450e4333a78bde0

and I can confirm that this fixes the issue completely (at least for D3D11VA). Btw. I found another ticket which might be related to this problem

https://trac.ffmpeg.org/ticket/9771


Like I said a fix was applied to the javacv ffmpeg version so I tried to compile this ffmpeg version for Windows x86_x64 with this increased MAX_SLICES for further testing. D3D11VA works perfectly fine now but I noticed another issue with DXVA2. Still on certain keyframes I received an error but this time a different one. When calling

final int err = avcodec_send_packet(this.m_VideoDecoderCtx, this.pkt);

I receive now

[h264 @ 0000027a270a1680] Buffer for type 5 was too small. size: 58752, dxva_size: 55296
[h264 @ 0000027a270a1680] Failed to add bitstream or slice control buffer
[h264 @ 0000027a270a1680] hardware accelerator failed to decode picture
Error while decoding stream #0:0: Operation not permitted  

Upon further investigation it turned out that the decoding of these certain keyframes fail because the buffer returned from the IDirectXVideoDecoder_GetBuffer function was too small. And I don't have any explanation why. DXVA2 works perfectly fine btw for h265 (not affected by the MAX_SLICES problem and not by the too small buffer problem).

https://github.com/FFmpeg/FFmpeg/blob/9d70e74d255dbe37af52b0efffc0f93fd7cb6103/libavcodec/dxva2.c#L817

I tested this behaviour now on 4 different PCs and the pattern I found is that it works on PCs with AMD or Intel GPUs but if you have an NVIDIA GPU you get this small buffer problem.

How to reproduce:

% .\ffmpeg.exe -hwaccel dxva2 -i .\example.h264 test.jpg
ffmpeg version with a patch which sets MAX_SLICES in dxva2_h264.c to 256
built via javacv (java-presets build process)

I have provided here a Windows x64 ffmpeg build with that MAX_SLICES patch
https://github.com/grill2010/ExampleDXVA2/tree/main/example_data


I know that these are probably two bug reports in one but for me these things are related. The first problem with these small slices are probably an easy fix, the question is why is it like that in the first place? Other hw decoder for example like the dxva2 hevc decoder are overriding this small slice value but not dxva2 h264? Any reason for that? This is actually quite limiting the h264 hw acceleration for DXVA2 and D3D11VA.

For the second problem with the too small buffer on DXVA2 I have absolutely no idea what could cause this issue and why it (seemingly) only occur on devices with an NVIDIA GPU. I would like to start some further testing but no idea where to start.

I tried to provide as much details as possible if there are any questions just let me know.


Change History (10)

comment:1 by Balling, 11 months ago

Other popular ffmpeg forks already increased this limit, e.g.

But increase is not needed for software decoder.

Quote from my bug: "I will also point out that there is no difference with SW decoding of sample in #628 between the master and the fork which means the increase in SAMPLES in avc bitsream does not matter on that sample".

That strongly suggests it is some cosmetic issue, since software decoder creates needed tables for bigger slices...

Last edited 11 months ago by Balling (previous) (diff)

comment:2 by Florian Grill, 11 months ago

Hmm, so what to do? Let's keep h264 DXVA2 and D3D11VA decoder broken? Funny thing is software decoding works and doesn't care about this limitation. I already showed where the limit is used in dxva2_h264.c, so why is dxva2_h264 just not overriding this limit like the dxva2_hevc decoder for example? What actually is the advantage of such a small slice limit and why is this limit throttling the capability of the h264 DXVA2 and D3D11VA decoder? Sorry I just don't understand the reason behind it. The MAX_SLICES value is maybe a good default value but there is no way to override this from an application point of view, which leads to the fact that the h264 DXVA2 and D3D11VA decoder can't be used if the user is dealing with a stream which contains key frames with slices bigger than 32.

comment:3 by Balling, 11 months ago

why is dxva2_h264 just not overriding this limit like the dxva2_hevc decoder for example

It should, I agree, that is the bug.

What actually is the advantage of such a small slice limit and why is this limit throttling the capability of the h264 DXVA2 and D3D11VA decoder?

That constant (MAX_SLICES) is used while generateing the tables on compile time, which uses memory in the binary and thus in RAM. The problem is that memory is used many many times. Now the SW decoder needs to preparse the bitstream anyway for HW decoder, but it does not use those tables for decoding.

Last edited 11 months ago by Balling (previous) (diff)

comment:4 by Florian Grill, 11 months ago

Okay I see thanks for the explanation, not sure how much overhead this produces actually but could not be that much but I see why it's that low then. As long as dxva2_h264.c overrides that value it's fine anyway.
Which then brings me to the second bug, the small buffer problem with DXVA2 on Nvidea GPUs (if the slices are set to 256 already). This is something I have no explanation for, might be a driver issue but not sure. For now I could only reproduce the small buffer problem on devices with an NVIDEA GPU

comment:5 by Balling, 11 months ago

Yep, that is the same issue:

Lav\ffmpeg.exe -hwaccel d3d11va -i "example.h264" -f md5 -

gives cb41681f69478a81624b15f11ff96673

and so does ffmpeg master, for

Lav\ffmpeg.exe -hwaccel d3d11va -i "example.h264" -f md5 -

this gives cb41681f69478a81624b15f11ff96673, yet for master ffmpeg it gives 7ff0f074231730af056ed9dc526906b3.

This is indeed my bug.

comment:6 by Florian Grill, 11 months ago

Should I make a patch pull request for this MAX_SLICES issue in the dxva2_h264 file or why isn't there already a pull request for it? It would fix at least the slices limitation for DXVA2 and D3D11VA. Should be a one liner fix, I have to check what's the standard procedure for making a pull request but should not be that difficult I guess?

comment:7 by Balling, 11 months ago

Should I make a patch pull request for this MAX_SLICES issue in the dxva2_h264 file

Verify that changing that line fixes both dxva2 and -hwaccel d3d11va (cb41681f69478a81624b15f11ff96673) and sure. Install git send-email and register it with your gmail using https://gist.github.com/jasonkarns/4354421#gistcomment-4088239

After that just use git send-email -1 to send the commit on top of current branch.

Do not forhet to first register on ffmpeg-devel mailing list and on https://patchwork.ffmpeg.org/project/

Last edited 11 months ago by Balling (previous) (diff)

comment:8 by Florian Grill, 11 months ago

Well I already tried that, with overriding the MAX_SLICES in the dxva2_h264 file the following command

.\ffmpeg.exe -hwaccel dxva2 -i .\example.h264 test.jpg

And

.\ffmpeg.exe -hwaccel d3d11va -i .\example.h264 test.jpg

both producing the same and correct expected image. Is that what is needed to approve that it us working?

comment:9 by Balling, 11 months ago

Is that what is needed to approve that it us working?

Ideally you check md5. ffmpeg.exe -hwaccel dxva2 -i .\example.h264 -f md5 -

it should be the same that -hwaccel cuda produces.

Last edited 11 months ago by Balling (previous) (diff)

comment:10 by Balling, 9 months ago

Any progress?

Note: See TracTickets for help on using tickets.