Opened 3 weeks ago

Last modified 7 days ago

#7507 new enhancement

hwupload: missing device platform

Reported by: msiders Owned by:
Priority: normal Component: avfilter
Version: git-master Keywords: hwupload, hwaccel
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

% ffmpeg -hwaccel cuvid -hwaccel qsv -i input ... -filter_complex "hwupload" ... output

Hi,

The use of multiple "-hwaccel" parameters is valid, as they only initialize the gpu libraries.

However, at time it's impossible to use two different GPUs because the lack of a "hwupload" filter parameter for selecting the target platform.

Consider this example:

  • One PC with one nVidia card and an Intel i7 CPU.
  • This plaform has support for CUDA and QSV, and you can compile FFmpeg for supporting both.
  • Imagine then you execute the decoding part using CUDA, and the enconding with QSV.
  • In this case the graph of filters can be constructed: backtrace for the encoder, and fordward for the decoder.
  • And in the middle, you need to use one "hwdownload" plus "hwupload" to complete the graph filter.

But, the problem is that the "hwupload" can't distinguish which hardware context to select.

This problem doesn't exist with the "hwdownload" filter, as the target is all the time the software memory; and the source is based on the previous filter.

So, please can you expand the "hwupload" to support the target platform?
Thank you.

Change History (9)

comment:1 Changed 3 weeks ago by oromit

Looking at the code in ffmpeg.c, it looks like it only supports one single -hwaccel per input stream.
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=fftools/ffmpeg_opt.c;h=d4851a2cd8c96cf036607017241dfa18f103c34f;hb=HEAD#l800
They do more than "only initialize the gpu libraries", in fact, that's one thing they don't really do at all.

The hwupload filter chooses what hardware to use based on the type of hw_device_ctx set on its context.

I don't think this is possible without massive rewrites of ffmpeg.c.
In a custom application using the ffmpeg API it should be relatively straight forward.

Last edited 3 weeks ago by oromit (previous) (diff)

comment:2 Changed 3 weeks ago by msiders

Hi oromit,

Thank you for pointing to the initialization code of the ffmpeg.c
Now I see that only one "ist->hwaccel_id" is used.

I feel this could be changed to some list of "active" hardware accelerators (like the list of available hw accels). Then multiple hw_device_ctx can be active in one execution of the FFmpeg.

I hope someone would like to implement this. The rewrite of the ffmpeg.c doesn't need to be "massive" as for each filter just only one "hw_device_ctx" can be used. So the change only implies to the initialization section and to pass the correct context to each filter.

You agree?

comment:3 Changed 3 weeks ago by oromit

How would ffmpeg.c tell which filter gets which context? Filters/De/Encoders? can only take exactly one hw_device_ctx.

comment:4 follow-up: Changed 3 weeks ago by msiders

Hi oromit,

This is my suggestion:

  • The vars "hw_device_ctx" and "hw_frames_ctx" are both AVBufferRef pointers.
  • And ANY filter/decoder/encoder only takes just one "hw_device_ctx".
  • So the pointers doesn't change!
  • The only difference would be that the new hwupload can point to a new context pointer.

So, the basis are that the "hw_device_ctx" doesn't change for filters.

See this overview example:

INPUT --> DECODER A --> FILTER B --> HWDOWNLOAD --> FILTER C --> HWUPLOAD --> ENCODER D --> OUTPUT

As you can note:

  • GPU platform X: applies to A,B & hwdownload.
  • System memory: C only .
  • GPU platform Y: hwupload & D.

In this case, the "hw_device_ctx" is unique for filters and opaque to all. Just only the ffmpeg.c needs to initialize the different platforms when more than one is used. And the hwupload needs to select between the different available contexts inside the ffmpeg.c list.

Does this sound good to you?

comment:5 in reply to: ↑ 4 Changed 3 weeks ago by jkqxz

It sounds like you want to look at the options -init_hw_device, -hwaccel_device and -filter_hw_device.

Replying to msiders:

INPUT --> DECODER A --> FILTER B --> HWDOWNLOAD --> FILTER C --> HWUPLOAD --> ENCODER D --> OUTPUT

As you can note:

  • GPU platform X: applies to A,B & hwdownload.
  • System memory: C only .
  • GPU platform Y: hwupload & D.

Something like:

ffmpeg -init_hw_device x_dev:x_args -init_hw_device y_dev:y_args -hwaccel x -hwaccel_device x_dev ... -i ... -filter_hw_device y_dev -vf b,hwdownload,c,hwupload -c:v d ...

comment:6 follow-up: Changed 3 weeks ago by msiders

Hi jkqxz,

Thank you for the tip regarding init_hw_device, hwaccel_device and filter_hw_device parameters. After review them I have a partial success... they work but only when the decoding is done in one GPU and the frames are outputing to the RAM. In this case I can then upload to the other GPU and compress them.

But we still NEED a target parameter for the hwupload. The target "hw_device" is a must have for this filter!

Let me to show this "simplified" example:

  • You can run it in a computer with intel CPU and a Nvidia video card (more simple for a fast test with Windows than Linux, as the drivers have direct support for CUVID and QSV and you don't need to install external libraries).
ffmpeg \
 -init_hw_device qsv=qsv -init_hw_device cuda=cuda \
 -hwaccel cuda -hwaccel_device cuda \
 -c:v h264_cuvid -i input.ts \
 -filter_hw_device qsv -filter_complex "[v:0]split=1[out0];[out0]hwdownload,format=nv12[out1];[out1]hwupload,deinterlace_qsv[out2]" \
 -map [out2] \
 -c:v:0 h264_qsv \
 -f mpegts output.ts
)

This example DOESN'T WORK!! And let me to explain the command line-by-line:

  1. Start the ffmpeg process.
  2. Init the libraries and hardware for GPU support in both QSV & CUVID (no errors, all OK).
  3. Select the nvidia card as the target DECODER. Without this line the decoder writes the frames in RAM memory, instead of GPU. So it's required to leave the frames in GPU memory after decoding them.
  4. Use the CUVID decoder for decoding the input file.
  5. Here the problem: Create a filter graph that it does: a copy of the GPU frames (I use this filter as an example of filter in the GPU "A"); then download from the GPU frames to the RAM memory; and upload to the GPU "B" because the use of the parameter -filter_hw_device qsv; and finally a filter in the GPU "B". This graph is apparently valid.
  6. This maps the output of the previous filter graph to route the frames to the encoder.
  7. Encode frames with the hardware encoder in the GPU "B".
  8. Write to the output file.

But it fails with this error:

[format @ 0000020ae7026c80] Setting 'pix_fmts' to value 'nv12|qsv'
[auto_scaler_0 @ 0000020ae7027880] w:iw h:ih flags:'bilinear' interl:0
[Parsed_hwdownload_1 @ 0000020ae7026a80] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_split_0' and t
he filter 'Parsed_hwdownload_1'
Impossible to convert between the formats supported by the filter 'Parsed_split_0' and the filter 'auto_scaler_0'

What's the problem?
Te problem is because the -filter_hw_device qsv is a GLOBAL setting. Then the hwdownload filter uses the GPU "B" as the source... and this is wrong.

Solution?
Not use at all the -filter_hw_device in this case and replace the complex_filter with:

 -filter_complex "[v:0]split=1[out0];[out0]hwdownload,format=nv12[out1];[out1]hwupload=hw_device=qsv,deinterlace_qsv[out2]" \

This should work as the hwdownload can get the hw_device context from the source stream, and the only ambiguity is in the target GPU for the hwupload.

I hope someone fixes this problem, or help me to fix my example.
Thank you!

comment:7 Changed 3 weeks ago by msiders

  • Version changed from unspecified to git-master

comment:8 in reply to: ↑ 6 ; follow-up: Changed 2 weeks ago by jkqxz

Replying to msiders:

ffmpeg \
 -init_hw_device qsv=qsv -init_hw_device cuda=cuda \
 -hwaccel cuda -hwaccel_device cuda \
 -c:v h264_cuvid -i input.ts \
 -filter_hw_device qsv -filter_complex "[v:0]split=1[out0];[out0]hwdownload,format=nv12[out1];[out1]hwupload,deinterlace_qsv[out2]" \
 -map [out2] \
 -c:v:0 h264_qsv \
 -f mpegts output.ts
)

...

[format @ 0000020ae7026c80] Setting 'pix_fmts' to value 'nv12|qsv'
[auto_scaler_0 @ 0000020ae7027880] w:iw h:ih flags:'bilinear' interl:0
[Parsed_hwdownload_1 @ 0000020ae7026a80] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_split_0' and t
he filter 'Parsed_hwdownload_1'
Impossible to convert between the formats supported by the filter 'Parsed_split_0' and the filter 'auto_scaler_0'

Looks like format autonegotiation didn't manage to match the formats correctly through the split filter? Try putting a format=cuda instance between split and hwdownload.

Te problem is because the -filter_hw_device qsv is a GLOBAL setting. Then the hwdownload filter uses the GPU "B" as the source... and this is wrong.

This is incorrect. hwdownload uses the device connected to whatever frames it gets on its input; it doesn't touch AVFilterContext.hw_device_ctx at all.

comment:9 in reply to: ↑ 8 Changed 7 days ago by msiders

Replying to jkqxz:

Looks like format autonegotiation didn't manage to match the formats correctly through the split filter? Try putting a format=cuda instance between split and hwdownload.

This is incorrect. hwdownload uses the device connected to whatever frames it gets on its input; it doesn't touch AVFilterContext.hw_device_ctx at all.

Hi,

At time, using only 2 different hw_accelerators (qsv and cuda) I have success with spliting in first GPU, and downloading/uploading to the second GPU. The "hack" basis is to use as -hwaccel the first platform, then add the -filter_hw_device with the second platform before the -filter_complex. Then the hwdownload reads from the first GPU, and the hwupload writes in the second one.

However, I need to do a test with 3 different platforms. I don't know how multiple hwupload in a -filter_complex can distinguish between different platforms... as just only one -filter_hw_device can be interposed.

For example, how to resolve?

ffmpeg \
 -init_hw_device qsv=qsv -init_hw_device cuda=cuda -init_hw_device opencl=opencl \
 -hwaccel cuda -hwaccel_device cuda \
 -c:v h264_cuvid -i input.ts \
 -filter_hw_device opencl -filter_complex "[v:0]split=1[out1];[out1]hwdownload,format=nv12[out2];[out2]hwupload,unsharp_opencl[out3];[out3]hwdownload,format=nv12[out4];[out4]hwupload,deinterlace_qsv[target]" \
 -map [target] \
 -c:v:0 h264_qsv \
 -f mpegts output.ts

In this case the path is:

CUDA(decode)-->RAM-->OPENCL(unsharp)-->RAM-->QSV(deinterlace)-->QSV(encode)

So, how the second hwupload identifies the QSV target?
Any idea?

Note: See TracTickets for help on using tickets.