Opened 4 years ago

Last modified 4 years ago

#8396 new defect

hwdownload always use 0th device (hwaccel_device 0)

Reported by: darn Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords: nvenc
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

I have system with 3 GeForce GTX 1080 TI.
I have "smart" libavfilter behaviour described at #5587. So i need to use hwdownload command.
When i use hwdownload command in filter_complex ffmpeg always use 0 hw device.

Run without hwdownload and hwaccel_device 2:

ffmpeg-cuda \
-hide_banner \
-probesize 10M \
-analyzeduration 10M \
-nostats \
-hwaccel cuvid \
-hwaccel_device 2 \
-c:v h264_cuvid \
-i "udp://hidden_ip:58631?reuse=1&pkt_size=1316&buffer_size=2621440&fifo_size=2621440" \
-filter_complex " \
[0:v]scale_npp=-1:-1:format=yuv420p:interp_algo=lanczos[v0] " \
-c:v h264_nvenc \
-preset:v llhq \
-rc:v vbr_hq \
-profile:v:0 high \
-level:0 4.1 \
-b:v:0 5600000 \
-forced-idr 1 \
-strict_gop 1 \
-no-scenecut 1 \
-g 125 \
-r 25 \
-keyint_min 125 \
-c:a aac \
-b:a 96k \
-ac 2 \
-ar 48000 \
-map "[v0]" \
-map 0:a:0 \
-f mpegts "udp://233.34.2.219:60041?reuse=1"

ffmpeg starts with 11156 PID.

$ nvidia-smi | grep 11156
|    2     11156      C   ffmpeg-cuda                                  313MiB |

All working fine.

Run with hwdownload and hwaccel_device 2:

ffmpeg-cuda \
-hide_banner \
-probesize 10M \
-analyzeduration 10M \
-nostats \
-hwaccel cuvid \
-hwaccel_device 2 \
-c:v h264_cuvid \
-i "udp://hidden_ip:58631?reuse=1&pkt_size=1316&buffer_size=2621440&fifo_size=2621440" \
-filter_complex " \
[0:v]scale_npp=-1:-1:format=yuv420p:interp_algo=lanczos,hwdownload,format=yuv420p[v0] " \
-c:v h264_nvenc \
-preset:v llhq \
-rc:v vbr_hq \
-profile:v:0 high \
-level:0 4.1 \
-b:v:0 5600000 \
-forced-idr 1 \
-strict_gop 1 \
-no-scenecut 1 \
-g 125 \
-r 25 \
-keyint_min 125 \
-c:a aac \
-b:a 96k \
-ac 2 \
-ar 48000 \
-map "[v0]" \
-map 0:a:0 \
-f mpegts "udp://233.34.2.219:60041?reuse=1"

ffmpeg starts with 936 PID.

$ nvidia-smi | grep 936
|    0       936      C   ffmpeg-cuda                                  196MiB |
|    2       936      C   ffmpeg-cuda                                  259MiB |

All working not fine. Stream was copied to 0 device.

Run with hwdownload and hwaccel_device 0:

ffmpeg-cuda \
-hide_banner \
-probesize 10M \
-analyzeduration 10M \
-nostats \
-hwaccel cuvid \
-hwaccel_device 0 \
-c:v h264_cuvid \
-i "udp://hidden_ip:58631?reuse=1&pkt_size=1316&buffer_size=2621440&fifo_size=2621440" \
-filter_complex " \
[0:v]scale_npp=-1:-1:format=yuv420p:interp_algo=lanczos,hwdownload,format=yuv420p[v0] " \
-c:v h264_nvenc \
-preset:v llhq \
-rc:v vbr_hq \
-profile:v:0 high \
-level:0 4.1 \
-b:v:0 5600000 \
-forced-idr 1 \
-strict_gop 1 \
-no-scenecut 1 \
-g 125 \
-r 25 \
-keyint_min 125 \
-c:a aac \
-b:a 96k \
-ac 2 \
-ar 48000 \
-map "[v0]" \
-map 0:a:0 \
-f mpegts "udp://233.34.2.219:60041?reuse=1"

ffmpeg starts with 936 PID.

$ nvidia-smi | grep 3952
|    0      3952      C   ffmpeg-cuda                                  456MiB |

All working fine.

As you can see, hwdownload always use 0th hwaccel_device.

Is it working as expected?

How can I make ffmpeg stop copying the stream to the 0th device?

Change History (3)

comment:1 by Carl Eugen Hoyos, 4 years ago

Component: ffmpegundetermined
Keywords: nvenc added; hwdownload hwaccel_device removed
Version: 4.2unspecified

If this is meant to be a bug report please test current FFmpeg git head and provide the simplified (!) command line together with the complete, uncut console output to make this a valid ticket.

comment:2 by Timo R., 4 years ago

hwdownload copies frames _from_ the device, and thus uses the context of the frames it gets as input. The outcoming frames are in system RAM and not tied to any device.

The second CUDA context on the default device (0) you are seeing is nvenc getting fed non-CUDA frames, which triggers it to create its own CUDA context to re-uploads the frames on.
nvenc has its own option (-gpu) controlling on which device it creates that context.

But really, why even download in the first place, just for nvenc to re-upload immediately?

in reply to:  2 comment:3 by darn, 4 years ago

Replying to oromit:

hwdownload copies frames _from_ the device, and thus uses the context of the frames it gets as input. The outcoming frames are in system RAM and not tied to any device.

Yes.
hwdownload -- copy from GPU memory to system memory.
hwupload -- copy from system memory to GPU memory.

Is it correct?

The second CUDA context on the default device (0) you are seeing is nvenc getting fed non-CUDA frames, which triggers it to create its own CUDA context to re-uploads the frames on.
nvenc has its own option (-gpu) controlling on which device it creates that context.

I tried to use "-gpu 2" setting, the result is the same.

But really, why even download in the first place, just for nvenc to re-upload immediately?

As far as I understand my configuration on filter_complex

-filter_complex "[0:v]scale_npp=-1:-1:format=yuv420p:interp_algo=lanczos,hwdownload,format=yuv420p[v0]"

works as follows:

  1. Input stream "[0:v]" copying to GPU memory;
  2. GPU "scale_npp" input stream to "-1:-1" with "format=yuv420p:interp_algo=lanczos";
  3. GPU "hwdownload" with "format=yuv420p" from GPU memory to system memory with name [v0].

Is it correct?

Note: See TracTickets for help on using tickets.