Opened 6 years ago

Last modified 2 years ago

#6989 open defect

Hwaccel cuvid fails with “Error creating a NVDEC decoder: 1”

Reported by: tkalliom Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: hwaccel cuda cuvid NVDEC
Cc: vyac.andrejev@gmail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

When trying to use an nVidia GPU for decoding video, it fails with the following output (a full trace attached):

[h264 @ 0x558d24633900] NVDEC capabilities:
[h264 @ 0x558d24633900] format supported: yes, max_mb_count: 65536
[h264 @ 0x558d24633900] min_width: 48, max_width: 4096
[h264 @ 0x558d24633900] min_height: 16, max_height: 4096
[h264 @ 0x558d24633900] Error creating a NVDEC decoder: 1
[h264 @ 0x558d24633900] Failed setup for format cuda: hwaccel initialisation returned error.
[h264 @ 0x558d24633900] Format cuda not usable, retrying get_format() without it.

The trace attached is for an AVC video, but the same occurs for HEVC.

The system has two GTX 1080 Tis (1b06 rev a1), and the driver version is 384.111. I have managed to run CUDA programs without problems.

How to reproduce:

% ffmpeg -loglevel trace -hwaccel cuvid -i small.mp4 small2.mp4
ffmpeg version 3.4.git-1
built on gcc 7 (Debian 7.2.0-19)

Attachments (1)

err.txt (205.2 KB ) - added by tkalliom 6 years ago.
A trace of a failing run

Download all attachments as: .zip

Change History (21)

by tkalliom, 6 years ago

Attachment: err.txt added

A trace of a failing run

comment:1 by Philip Langdale, 6 years ago

Resolution: invalid
Status: newclosed

That command line cannot work, so your results are not surprising.

You either want to use the new nvdec hwaccel, or the old cuvid decoder. In either case, you need different arguments.

To use the old style cuvid decoder:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 small2.mp4

To use the new style nvdec hwaccel:

$ ffmpeg -hwaccel nvdec small.mp4 small2.mp4

Also note, that neither command line construction will result in the use of nvenc on the encode side. For that, you need:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 -c:v h264_nvenc small2.mp4

or

$ ffmpeg -hwaccel nvdec -hwaccel_output_format cuda small.mp4 -c:v h264_nvenc small2.mp4

and then various encoder options to make it realistic.

comment:2 by tkalliom, 6 years ago

The option -hwaccel nvdec works without errors. However, something is off, as decoding (e.g. throwing rawvideo to /dev/null) is actually ~15% _slower_ than on CPU...

Also, the documentation seems to be lacking. ffmpeg -hwaccels gives cuvid but no nvdec, the manpage mentions only vdpau, and hwaccel_output_format is only mentioned on wikipages regarding VAAPI.

comment:3 by Philip Langdale, 6 years ago

I don't know what command line you're attempting, but usually, you'd expect nvdec to be faster than software even after accounting for read-back penalty (remember that for video playback or full hardware transcoding with nvenc you never read decoded frames back to system memory).

nvdec doesn't appear on the -hwaccels list because it's an alias for 'cuda', which is the official option name.

Documentation can always be improved.

comment:4 by tkalliom, 6 years ago

My real use case is turning a video to a bunch of bitmaps on a file system for analysis, so there is actually also some color space conversion involved. I tried to focus on decoding and present the simplest possible command line to exhibit the unexpected behavior, but I guess if you simplify too much the examples become inane...

With regards to the performance differences in decoding: I now compared CPU and NVDEC performance using the 3840x2160 HEVC sample at http://cloud.ultrahdtv.net/fitness-trailer-8000.mkv.

$ ffmpeg -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >/dev/null
Avg. 132FPS on CPU for just decoding and throwing frames away.

$ ffmpeg -hwaccel cuda -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >/dev/null
Avg. 107FPS on GPU – 19% slower on just decoding.

$ ffmpeg -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - > /tmp/frames.dat
Avg. 100FPS on CPU for piping to tmpfs.

ffmpeg -hwaccel cuda -i fitness-trailer-8000.mkv -f image2pipe -vcodec rawvideo - >frames.dat
Avg. 62FPS on GPU – 37% slower.

So, nvdec is decoding slower than software...

comment:5 by Timo R., 6 years ago

hwaccels do not claim to be way faster than CPUs. They claim to not use any CPU while doing their thing, while still being fast enough for realtime playback.

comment:6 by Philip Langdale, 6 years ago

I have a 1080 (non Ti) here and I get 219fps with nvdec and 82fps without. You're building ffmpeg from latest source?

in reply to:  1 comment:7 by Slava Andrejev, 4 years ago

Resolution: invalid
Status: closedreopened

Replying to philipl:

That command line cannot work, so your results are not surprising.

You either want to use the new nvdec hwaccel, or the old cuvid decoder. In either case, you need different arguments.

To use the old style cuvid decoder:

$ ffmpeg -hwaccel cuvid -c:v h264_cuvid small.mp4 small2.mp4

To use the new style nvdec hwaccel:

$ ffmpeg -hwaccel nvdec small.mp4 small2.mp4

In the latest git snapshot both command lines will actually use exactly the same NVIDIA API. This is an actual bug in ffmpeg. The reason why the first one doesn't work is with -hwaccel cuvid ffmpeg will assign 0 to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces. Therefore, a consequent call to cuvidCreateDecoder fails, because NVDEC doesn't have memory to store decoded pictures.

Now follow me how it happens. When you specify -hwaccel in the command line parameters, ffmpeg assigns a constant from enum HWAccelID to InputStream::hwaccel_id. For -hwaccel cuvid this constant will be HWACCEL_CUVID. On other hand -hwaccel nvdec magically works not because it uses some secret sauce API, but because InputStream::hwaccel_id will be equal to HWACCEL_GENERIC. Here is how. First, add_input_streams will change nvdec to cuda here:

                if (!strcmp(hwaccel, "nvdec"))
                    hwaccel = "cuda";

Next, it will compare it with the strings from a global variable hwaccels (defined in ffmpeg_opt.c) to derive HWAccelID. This table (surprise, surprise) has only three values: "videotoolbox", "qsv", and "cuvid". I.e. it has neither nvdec, nor cuda. Yes, the only one correct command line argument for NVIDIA HW acceleration is -hwaccel cuvid. When you use -hwaccel nvdec, ffmpeg assigns HWACCEL_GENERIC to InputStream::hwaccel_id:

                    if (!ist->hwaccel_id) {
                        type = av_hwdevice_find_type_by_name(hwaccel);
                        if (type != AV_HWDEVICE_TYPE_NONE) {
                            ist->hwaccel_id = HWACCEL_GENERIC;
                            ist->hwaccel_device_type = type;
                        }
                    }

Here is what happens next. get_format is called. If InputStream::hwaccel_id is set to HWACCEL_CUVID, then get_format will call cuvid_init:

            ret = hwaccel->init(s); // <-- here is a call to cuvid_init
            if (ret < 0) {
                av_log(NULL, AV_LOG_FATAL,
                       "%s hwaccel requested for input stream #%d:%d, "
                       "but cannot be initialized.\n", hwaccel->name,
                       ist->file_index, ist->st->index);
                return AV_PIX_FMT_NONE;
            }

cuvid_init will allocate InputStream::hw_frames_ctx:

    ist->hw_frames_ctx = av_hwframe_ctx_alloc(hw_device_ctx);

Then get_format will assign a reference to this structure to AVCodecContext::hw_frames_ctx:

        if (ist->hw_frames_ctx) {
            s->hw_frames_ctx = av_buffer_ref(ist->hw_frames_ctx);
            if (!s->hw_frames_ctx)
                return AV_PIX_FMT_NONE;
        }

Are you still following? Bear with me, I almost finished. Now the key moment where it all fails. When ff_nvdec_decode_init is called it executes the following lines:

    if (!avctx->hw_frames_ctx) {
        ret = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_CUDA);
        if (ret < 0)
            return ret;
    }

Remember that we initialized hw_frames_ctx in cuvid_init. Therefore in the code above ff_decode_get_hw_frames_ctx is not executed, an important consequence of this is that the following line in ff_nvdec_frame_params also is not executed:

    frames_ctx->initial_pool_size = dpb_size + 2;

AVHWFramesContext::initial_pool_size stays equal to zero. Then ff_nvdec_decode_init continues to here:

    params.ulNumDecodeSurfaces = frames_ctx->initial_pool_size;
    params.ulNumOutputSurfaces = frames_ctx->initial_pool_size;

Which Hey Presto assigns zero to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces, and eventually cuvidCreateDecoder fails.

Now why does it work with -hwaccel nvdec? It's funny, the reason is because it doesn't call cuvid_init. Remember that this command line makes InputStream::hwaccel_id to be equal to HWACCEL_GENERIC? In this case ffmpeg omits hwaccel->init(s) call, then hw_frames_ctx is not allocated, then ff_nvdec_decode_init calls ff_decode_get_hw_frames_ctx, then ff_nvdec_frame_params assigns the correct value to AVHWFramesContext::initial_pool_size, then ff_nvdec_decode_init transfers it to CUVIDDECODECREATEINFO::ulNumDecodeSurfaces and to CUVIDDECODECREATEINFO::ulNumOutputSurfaces... finally cuvidCreateDecoder succeeds.

comment:8 by Slava Andrejev, 4 years ago

Cc: vyac.andrejev@gmail.com added

in reply to:  8 comment:9 by Dennis E. Mungai, 4 years ago

Replying to DrocUf:

Probably related to https://trac.ffmpeg.org/ticket/7562, which suggests }-extra_hw_frames 2 as a workaround.

I've tested this in production and even then, the error does appear from time to time, typically in this form: https://devtalk.nvidia.com/default/topic/1031891/video-codec-and-optical-flow-sdk/minimal-nvdecode-experiment-fails-to-map-errors-with-mapping-of-buffer-object-failed-/

comment:10 by Balling, 4 years ago

Status: reopenedopen

Looks like guys just removed cuvid hwaccel (not cuviddec), now it is just a symlink to nvdec (that is a symlink to cuda). Look here:

!strcmp(hwaccel, "nvdec") || !strcmp(hwaccel, "cuvid"))

, so it will not broke anything (like scripts).
https://github.com/FFmpeg/FFmpeg/commit/60b1f85b67ccb907e4eba3e2c98839769690ed24

Anyway, can this "It's funny, the reason is because it doesn't call cuvid_init" be fixed, is it even supported my modern drivers?

Wow, so you could have used -hwaccel nvdec -c:v h264_cuvid

Last edited 4 years ago by Balling (previous) (diff)

comment:11 by Slava Andrejev, 4 years ago

Looks like guys just removed cuvid

OMG what they have done. cuvid was the only correct value for hwaccel on NVIDIA GPUs.

in reply to:  11 ; comment:12 by Balling, 4 years ago

Replying to DrocUf:

Looks like guys just removed cuvid

OMG what they have done. cuvid was the only correct value for hwaccel on NVIDIA GPUs.

It still works, you did not try to use it? And it is the same code underhood because of this bug.

comment:13 by Timo R., 4 years ago

I still fail to see any bug here. It falling back to the generic hwaccel path is exactly what we want.
The generic bringup code for cuda is working fine and creates all neccesary contexts.

The removed code in the ffmpeg CLI tool was only poorly maintained duplication of what is now handled in libavutil.

in reply to:  13 comment:14 by Balling, 4 years ago

Replying to oromit:

I still fail to see any bug here. It falling back to the generic

Then this is fixed? #7562
"same command with -bf 3, interlaced input BUT -hwaccel cuvid, succeeds"

Last edited 4 years ago by Balling (previous) (diff)

comment:15 by Balling, 4 years ago

Did you try to change AVCodecContext::hw_frames_ctx
before the latest commit? Or better (obviously) to set AVHWFramesContext::initial_pool_size to two in ff_nvdec_decode_init?

Last edited 4 years ago by Balling (previous) (diff)

in reply to:  12 ; comment:16 by Slava Andrejev, 4 years ago

Replying to Balling:

It still works, you did not try to use it?

What is "it"? hwaccel cuvid didn't work before. Now it works because it relies on a bug.

And it is the same code underhood because of this bug.

It's the same under the hood because there is only one way to access video decoding hardware on NVIDIA GPU: through CUDA driver. There is no some "generic" or "native" way to drive video decoding hardware. Thus cuvid_init must be used.

in reply to:  16 ; comment:17 by Balling, 4 years ago

Replying to DrocUf:

Replying to Balling:

It still works, you did not try to use it?

What is "it"? hwaccel cuvid didn't work before. Now it works because it relies on a bug.

-hwacell cuvid -c:v h264_cuvid is working for me. This bug does not apply for me. Dunno. -hwaccel cuvid without -c:v never worked, no? The fact cuvidCreateDecoder fails is disturbing, though. But should not it fail because there is no -c:v??

And it is the same code underhood because of this bug.

Thus cuvid_init must be used.

No, the removed code in the ffmpeg CLI tool was only poorly maintained duplication of what is now handled in libavutil.

Last edited 4 years ago by Balling (previous) (diff)

in reply to:  17 ; comment:18 by Slava Andrejev, 4 years ago

Replying to Balling:

Dunno

Exactly.

no?

Asking questions is hardly a good way to deliver your point within a constructive discussion.

I am surprised to see so much opposition. I'll let you play within your sandbox, guys.

in reply to:  18 comment:19 by Balling, 4 years ago

Replying to DrocUf:

Replying to Balling:
I am surprised to see so much opposition. I'll let you play within your sandbox, guys.

Do not be, this is natural. Anyway, I added your comment here https://trac.ffmpeg.org/ticket/7562#comment:6 about that + 2, it really looks plausible.

comment:20 by Balling, 2 years ago

Can we close this?

Note: See TracTickets for help on using tickets.