Opened 2 years ago

Last modified 3 months ago

#6989 open defect

Hwaccel cuvid fails with “Error creating a NVDEC decoder: 1”

Reported by: tkalliom Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: hwaccel cuda cuvid NVDEC
Cc: vyac.andrejev@gmail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

When trying to use an nVidia GPU for decoding video, it fails with the following output (a full trace attached):

[h264 @ 0x558d24633900] NVDEC capabilities:
[h264 @ 0x558d24633900] format supported: yes, max_mb_count: 65536
[h264 @ 0x558d24633900] min_width: 48, max_width: 4096
[h264 @ 0x558d24633900] min_height: 16, max_height: 4096
[h264 @ 0x558d24633900] Error creating a NVDEC decoder: 1
[h264 @ 0x558d24633900] Failed setup for format cuda: hwaccel initialisation returned error.
[h264 @ 0x558d24633900] Format cuda not usable, retrying get_format() without it.

The trace attached is for an AVC video, but the same occurs for HEVC.

The system has two GTX 1080 Tis (1b06 rev a1), and the driver version is 384.111. I have managed to run CUDA programs without problems.

How to reproduce:

% ffmpeg -loglevel trace -hwaccel cuvid -i small.mp4 small2.mp4
ffmpeg version 3.4.git-1
built on gcc 7 (Debian 7.2.0-19)

Attachments (1)

err.txt (205.2 KB) - added by tkalliom 2 years ago.
A trace of a failing run

Download all attachments as: .zip

Change History (20)

Changed 2 years ago by tkalliom

A trace of a failing run

comment:1 follow-up: Changed 2 years ago by philipl

  • Resolution set to invalid
  • Status changed from new to closed

That command line cannot work, so your results are not surprising.

You either want to use the new nvdec hwaccel, or the old cuvid decoder. In either case, you need different arguments.

To use the old style cuvid decoder:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 small2.mp4

To use the new style nvdec hwaccel:

$ ffmpeg -hwaccel nvdec small.mp4 small2.mp4

Also note, that neither command line construction will result in the use of nvenc on the encode side. For that, you need:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 -c:v h264_nvenc small2.mp4

or

$ ffmpeg -hwaccel nvdec -hwaccel_output_format cuda small.mp4 -c:v h264_nvenc small2.mp4

and then various encoder options to make it realistic.

comment:2 Changed 2 years ago by tkalliom

The option -hwaccel nvdec works without errors. However, something is off, as decoding (e.g. throwing rawvideo to /dev/null) is actually ~15% _slower_ than on CPU...

Also, the documentation seems to be lacking. ffmpeg -hwaccels gives cuvid but no nvdec, the manpage mentions only vdpau, and hwaccel_output_format is only mentioned on wikipages regarding VAAPI.

comment:3 Changed 2 years ago by philipl

I don't know what command line you're attempting, but usually, you'd expect nvdec to be faster than software even after accounting for read-back penalty (remember that for video playback or full hardware transcoding with nvenc you never read decoded frames back to system memory).

nvdec doesn't appear on the -hwaccels list because it's an alias for 'cuda', which is the official option name.

Documentation can always be improved.

comment:4 Changed 2 years ago by tkalliom

My real use case is turning a video to a bunch of bitmaps on a file system for analysis, so there is actually also some color space conversion involved. I tried to focus on decoding and present the simplest possible command line to exhibit the unexpected behavior, but I guess if you simplify too much the examples become inane...

With regards to the performance differences in decoding: I now compared CPU and NVDEC performance using the 3840x2160 HEVC sample at http://cloud.ultrahdtv.net/fitness-trailer-8000.mkv.

$ ffmpeg -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >/dev/null
Avg. 132FPS on CPU for just decoding and throwing frames away.

$ ffmpeg -hwaccel cuda -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >/dev/null
Avg. 107FPS on GPU – 19% slower on just decoding.

$ ffmpeg -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - > /tmp/frames.dat
Avg. 100FPS on CPU for piping to tmpfs.

ffmpeg -hwaccel cuda -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >frames.dat
Avg. 62FPS on GPU – 37% slower.

So, nvdec is decoding slower than software...

comment:5 Changed 2 years ago by oromit

hwaccels do not claim to be way faster than CPUs. They claim to not use any CPU while doing their thing, while still being fast enough for realtime playback.

comment:6 Changed 2 years ago by philipl

I have a 1080 (non Ti) here and I get 219fps with nvdec and 82fps without. You're building ffmpeg from latest source?

comment:7 in reply to: ↑ 1 Changed 5 months ago by DrocUf

  • Resolution invalid deleted
  • Status changed from closed to reopened

Replying to philipl:

That command line cannot work, so your results are not surprising.

You either want to use the new nvdec hwaccel, or the old cuvid decoder. In either case, you need different arguments.

To use the old style cuvid decoder:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 small2.mp4

To use the new style nvdec hwaccel:

$ ffmpeg -hwaccel nvdec small.mp4 small2.mp4

In the latest git snapshot both command lines will actually use exactly the same NVIDIA API. This is an actual bug in ffmpeg. The reason why the first one doesn't work is with -hwaccel cuvid ffmpeg will assign 0 to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces. Therefore, a consequent call to cuvidCreateDecoder fails, because NVDEC doesn't have memory to store decoded pictures.

Now follow me how it happens. When you specify -hwaccel in the command line parameters, ffmpeg assigns a constant from enum HWAccelID to InputStream::hwaccel_id. For -hwaccel cuvid this constant will be HWACCEL_CUVID. On other hand -hwaccel nvdec magically works not because it uses some secret sauce API, but because InputStream::hwaccel_id will be equal to HWACCEL_GENERIC. Here is how. First, add_input_streams will change nvdec to cuda here:

                if (!strcmp(hwaccel, "nvdec"))
                    hwaccel = "cuda";

Next, it will compare it with the strings from a global variable hwaccels (defined in ffmpeg_opt.c) to derive HWAccelID. This table (surprise, surprise) has only three values: "videotoolbox", "qsv", and "cuvid". I.e. it has neither nvdec, nor cuda. Yes, the only one correct command line argument for NVIDIA HW acceleration is -hwaccel cuvid. When you use -hwaccel nvdec, ffmpeg assigns HWACCEL_GENERIC to InputStream::hwaccel_id:

                    if (!ist->hwaccel_id) {
                        type = av_hwdevice_find_type_by_name(hwaccel);
                        if (type != AV_HWDEVICE_TYPE_NONE) {
                            ist->hwaccel_id = HWACCEL_GENERIC;
                            ist->hwaccel_device_type = type;
                        }
                    }

Here is what happens next. get_format is called. If InputStream::hwaccel_id is set to HWACCEL_CUVID, then get_format will call cuvid_init:

            ret = hwaccel->init(s); // <-- here is a call to cuvid_init
            if (ret < 0) {
                av_log(NULL, AV_LOG_FATAL,
                       "%s hwaccel requested for input stream #%d:%d, "
                       "but cannot be initialized.\n", hwaccel->name,
                       ist->file_index, ist->st->index);
                return AV_PIX_FMT_NONE;
            }

cuvid_init will allocate InputStream::hw_frames_ctx:

    ist->hw_frames_ctx = av_hwframe_ctx_alloc(hw_device_ctx);

Then get_format will assign a reference to this structure to AVCodecContext::hw_frames_ctx:

        if (ist->hw_frames_ctx) {
            s->hw_frames_ctx = av_buffer_ref(ist->hw_frames_ctx);
            if (!s->hw_frames_ctx)
                return AV_PIX_FMT_NONE;
        }

Are you still following? Bear with me, I almost finished. Now the key moment where it all fails. When ff_nvdec_decode_init is called it executes the following lines:

    if (!avctx->hw_frames_ctx) {
        ret = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_CUDA);
        if (ret < 0)
            return ret;
    }

Remember that we initialized hw_frames_ctx in cuvid_init. Therefore in the code above ff_decode_get_hw_frames_ctx is not executed, an important consequence of this is that the following line in ff_nvdec_frame_params also is not executed:

    frames_ctx->initial_pool_size = dpb_size + 2;

AVHWFramesContext::initial_pool_size stays equal to zero. Then ff_nvdec_decode_init continues to here:

    params.ulNumDecodeSurfaces = frames_ctx->initial_pool_size;
    params.ulNumOutputSurfaces = frames_ctx->initial_pool_size;

Which Hey Presto assigns zero to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces, and eventually cuvidCreateDecoder fails.

Now why does it work with -hwaccel nvdec? It's funny, the reason is because it doesn't call cuvid_init. Remember that this command line makes InputStream::hwaccel_id to be equal to HWACCEL_GENERIC? In this case ffmpeg omits hwaccel->init(s) call, then hw_frames_ctx is not allocated, then ff_nvdec_decode_init calls ff_decode_get_hw_frames_ctx, then ff_nvdec_frame_params assigns the correct value to AVHWFramesContext::initial_pool_size, then ff_nvdec_decode_init transfers it to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces... finally cuvidCreateDecoder succeeds.

comment:8 follow-up: Changed 5 months ago by DrocUf

  • Cc vyac.andrejev@gmail.com added

comment:9 in reply to: ↑ 8 Changed 3 months ago by Brainiarc7

Replying to DrocUf:

Probably related to https://trac.ffmpeg.org/ticket/7562, which suggests }-extra_hw_frames 2 as a workaround.

I've tested this in production and even then, the error does appear from time to time, typically in this form: https://devtalk.nvidia.com/default/topic/1031891/video-codec-and-optical-flow-sdk/minimal-nvdecode-experiment-fails-to-map-errors-with-mapping-of-buffer-object-failed-/

comment:10 Changed 3 months ago by Balling

  • Status changed from reopened to open

Looks like guys just removed cuvid hwaccel (not cuviddec), now it is just a symlink to nvdec (that is a symlink to cuda). Look here:

!strcmp(hwaccel, "nvdec") || !strcmp(hwaccel, "cuvid"))

, so it will not broke anything (like scripts).
https://github.com/FFmpeg/FFmpeg/commit/60b1f85b67ccb907e4eba3e2c98839769690ed24

Anyway, can this "It's funny, the reason is because it doesn't call cuvid_init" be fixed, is it even supported my modern drivers?

Wow, so you could have used -hwaccel nvdec -c:v h264_cuvid

Last edited 3 months ago by Balling (previous) (diff)

comment:11 follow-up: Changed 3 months ago by DrocUf

Looks like guys just removed cuvid

OMG what they have done. cuvid was the only correct value for hwaccel on NVIDIA GPUs.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 3 months ago by Balling

Replying to DrocUf:

Looks like guys just removed cuvid

OMG what they have done. cuvid was the only correct value for hwaccel on NVIDIA GPUs.

It still works, you did not try to use it? And it is the same code underhood because of this bug.

comment:13 follow-up: Changed 3 months ago by oromit

I still fail to see any bug here. It falling back to the generic hwaccel path is exactly what we want.
The generic bringup code for cuda is working fine and creates all neccesary contexts.

The removed code in the ffmpeg CLI tool was only poorly maintained duplication of what is now handled in libavutil.

comment:14 in reply to: ↑ 13 Changed 3 months ago by Balling

Replying to oromit:

I still fail to see any bug here. It falling back to the generic

Then this is fixed? #7562
"same command with -bf 3, interlaced input BUT -hwaccel cuvid, succeeds"

Last edited 3 months ago by Balling (previous) (diff)

comment:15 Changed 3 months ago by Balling

Did you try to change AVCodecContext::hw_frames_ctx
before the latest commit? Or better (obviously) to set AVHWFramesContext::initial_pool_size to two in ff_nvdec_decode_init?

Last edited 3 months ago by Balling (previous) (diff)

comment:16 in reply to: ↑ 12 ; follow-up: Changed 3 months ago by DrocUf

Replying to Balling:

It still works, you did not try to use it?

What is "it"? hwaccel cuvid didn't work before. Now it works because it relies on a bug.

And it is the same code underhood because of this bug.

It's the same under the hood because there is only one way to access video decoding hardware on NVIDIA GPU: through CUDA driver. There is no some "generic" or "native" way to drive video decoding hardware. Thus cuvid_init must be used.

comment:17 in reply to: ↑ 16 ; follow-up: Changed 3 months ago by Balling

Replying to DrocUf:

Replying to Balling:

It still works, you did not try to use it?

What is "it"? hwaccel cuvid didn't work before. Now it works because it relies on a bug.

-hwacell cuvid -c:v h264_cuvid is working for me. This bug does not apply for me. Dunno. -hwaccel cuvid without -c:v never worked, no? The fact cuvidCreateDecoder fails is disturbing, though.

And it is the same code underhood because of this bug.

Thus cuvid_init must be used.

No, the removed code in the ffmpeg CLI tool was only poorly maintained duplication of what is now handled in libavutil.

Version 2, edited 3 months ago by Balling (previous) (next) (diff)

comment:18 in reply to: ↑ 17 ; follow-up: Changed 3 months ago by DrocUf

Replying to Balling:

Dunno

Exactly.

no?

Asking questions is hardly a good way to deliver your point within a constructive discussion.

I am surprised to see so much opposition. I'll let you play within your sandbox, guys.

comment:19 in reply to: ↑ 18 Changed 3 months ago by Balling

Replying to DrocUf:

Replying to Balling:
I am surprised to see so much opposition. I'll let you play within your sandbox, guys.

Do not be, this is natural. Anyway, I added your comment here https://trac.ffmpeg.org/ticket/7562#comment:6 about that + 2, it really looks plausible.

Note: See TracTickets for help on using tickets.