Opened 10 months ago

Closed 10 months ago

Last modified 10 months ago

#11655 closed defect (fixed)

Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video

Reported by: nyanmisaka Owned by: Timo Rothenpieler <timo@rothenpieler.org>
Priority: normal Component: avcodec
Version: git-master Keywords: cuda nvdec nvidia hwaccel
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video.

How to reproduce:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i /path/to/10bit-video -an -sn -dn -vf hwdownload,format=p010le -f null -

...

[hwdownload @ 0000020612888240] Invalid output format p010le for hwframe download.
[Parsed_hwdownload_0 @ 00000206106e7140] Failed to configure output pad on Parsed_hwdownload_0
[vf#0:0 @ 0000020610752240] Error reinitializing filters!

Regression caused by https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/30e6effff94c6f4310aa2db571917bb2952f4d9e

For correctness and consistency with the old behavior, a change like this is required in the above commit:

+        } else {
+            frames_ctx->sw_format = sw_desc->comp[0].depth == 10 ? AV_PIX_FMT_P010LE : AV_PIX_FMT_P016LE;
+        }

Change History (13)

comment:1 by Timo R., 10 months ago

That is because nvdec does not output P010. P010 has the pixel data in the least significant bits, while nvdec outputs it in the most significant bits.
I.e. it outputs P016 with the 6 lowest bits zeroed out. FFmpeg does not have a "P010 but in the MSB" format, hence P016 is used.

Edit: I might be confusing this with the situation of nvenc and AV_PIX_FMT_YUV444P16 being used for 10 bit there. Let me re-check.

Last edited 10 months ago by Timo R. (previous) (diff)

comment:2 by nyanmisaka, 10 months ago

I thought CUVID and NVDEC differed mainly in the bitstream parser, because I still see P010LE in CUVID.

update: There is only a remnant mention of P010LE in cuviddec.c. Actually, it already outputs P016 in the relevant 4:2:2 enablement commit, just like NVDEC.

Last edited 10 months ago by nyanmisaka (previous) (diff)

comment:3 by Balling, 10 months ago

First of all there are two different types of this weird packed chroma, planar luma format P010LE. Little endian can be of Intel style and can be of Microsoft style.

What you are talking about here happened 10 years ago.
81147b5596ea19f7c5c153f4a534e9314d291fd3

And this was finally fixed just recently: #11369 and #11235

But yes, 420 10 bit video must decode as P010 not P012 or P016.

Last edited 10 months ago by Balling (previous) (diff)

comment:4 by Timo R., 10 months ago

No, this is actually just a bug in that commit.
The situation I described only exists for AV_PIX_FMT_YUV444P16. Since the it cannot simply be used as AV_PIX_FMT_YUV444P10, the layout of the bits is different.
But P010 can totally receive P016 data, and likewise can P210 for P216 for 422 output.
So only 444 is a problem.

comment:5 by nyanmisaka, 10 months ago

And this was finally fixed just recently: #11369 and #11235

What we are discussing here does not involve swscale conversion, but whether the pixel format output by NVDEC hardware is correctly defined.

comment:6 by Balling, 10 months ago

420 10 bit video must decode as P010 not P012 or P016. 420 12 bit must decode as P012 and 420 16 bit must decode as P016.

You see, 10 bit 420 video works as follows: 10 bit are used for Y plane and then x2 less bits are used for both Cb and Cr. So together it adds to 10 + 5 = 15 bits. That is P010, that is 16 bits. So just 1 bit wasted.

Last edited 10 months ago by Balling (previous) (diff)

comment:7 by Timo Rothenpieler <timo@rothenpieler.org>, 10 months ago

Owner: set to Timo Rothenpieler <timo@rothenpieler.org>
Resolution: fixed
Status: newclosed

In bf5f3f1f/ffmpeg:

avcodec/nvdec: fix 10bit output pixel formats

Fixes #11655

comment:8 by Timo R., 10 months ago

nvenc/nvdec use P016/P216 for historical reasons.
When they were implemented, P012 and P212 didn't exist.
So now migration to those is a bit complicated, since changing the output format is a breaking change.

And there still is a total lack of support for the 10 and 12 bit formats that nvdec outputs (and nvenc accepts for input) for 4:4:4, so that's always mapped to AV_PIX_FMT_YUV444P16, even though the LSB are always 0 (or ignored in case of nvenc).

comment:9 by nyanmisaka, 10 months ago

Thanks for the quick update. I think CUVID decoder needs a similar update, as it still outputs P016 for 10-bit video when you specify "-c:v hevc_cuvid". Around line ~200 of "libavcodec/cuviddec.c"

And there still is a total lack of support for the 10 and 12 bit formats that nvdec outputs (and nvenc accepts for input) for 4:4:4, so that's always mapped to AV_PIX_FMT_YUV444P16, even though the LSB are always 0 (or ignored in case of nvenc).

I'm aware of these issues as well. Especially for 4:4:4 formats in NVDEC/ENC, they don't use packed Y410/Y416 (AV_PIX_FMT_{XV30,XV36}) formats like D3D11VA/VAAPI uses. But since NVIDIA's Windows driver already supports them recently, I guess it shouldn't be too hard for them to add corresponding formats in CUDA?

edit: If they continue to use YUV444P16 as a container for easy pixel access, it would be nice to add some settings in the decoder/driver options to allow some bit shifting inside the decoder/driver to align with YUV420{P10,P12} in FFmpeg.

Last edited 10 months ago by nyanmisaka (previous) (diff)

comment:10 by Timo R., 10 months ago

I won't be adding a whole scale filter into the decoder, that's a horrible hack that simply won't happen.
Last time we attempted to add those special pix_fmts it was rejected for being an Nvidia special.
Will try again though, but no promises.

comment:11 by Timo R., 10 months ago

And I can't see anything wrong in cuviddec.c, it correctly outputs P010 for 10 bit 420.

comment:12 by Balling, 10 months ago

I won't be adding a whole scale filter into the decoder, that's a horrible hack that simply won't happen.

That would just slow it down with bitexact result and no benefits? Right?

in reply to:  11 comment:13 by nyanmisaka, 10 months ago

Replying to Timo R.:

And I can't see anything wrong in cuviddec.c, it correctly outputs P010 for 10 bit 420.

Hi Timo, you can use this command line to reproduce the problem in CUVID. And here is the log:

ffmpeg -v quiet -f lavfi -i nullsrc=s=1920x1080,format=p010le \
-c:v hevc_nvenc -vframes 1 -f nut - | ffmpeg -c:v hevc_cuvid -i - -f null -


ffmpeg version N-120169-g0fe9f25e76-20250704 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 15.1.0 (crosstool-NG 1.27.0.42_35c1e72)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-zlib --enable-libfribidi --enable-gmp --enable-libxml2 --enable-lzma --enable-fontconfig --enable-libharfbuzz --enable-libfreetype --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --enable-libdvdread --enable-libdvdnav --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libaribcaption --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-libzmq --enable-lv2 --enable-libvpl --enable-openal --enable-liboapv --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --enable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libvvenc --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-libs=-lgomp --extra-ldflags=-pthread --extra-ldexeflags= --cc=x86_64-w64-mingw32-gcc --cxx=x86_64-w64-mingw32-g++ --ar=x86_64-w64-mingw32-gcc-ar --ranlib=x86_64-w64-mingw32-gcc-ranlib --nm=x86_64-w64-mingw32-gcc-nm --extra-version=20250704
  libavutil      60.  4.101 / 60.  4.101
  libavcodec     62.  5.100 / 62.  5.100
  libavformat    62.  1.101 / 62.  1.101
  libavdevice    62.  0.100 / 62.  0.100
  libavfilter    11.  1.100 / 11.  1.100
  libswscale      9.  0.100 /  9.  0.100
  libswresample   6.  0.100 /  6.  0.100
Input #0, nut, from 'fd:':
  Metadata:
    encoder         : Lavf62.1.101
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: hevc (Main 10) (HEVC / 0x43564548), yuv420p10le(tv), 1920x1080 [SAR 1:1 DAR 16:9], 25 tbr, 51200 tbn
    Metadata:
      encoder         : Lavc62.5.100 hevc_nvenc
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (hevc_cuvid) -> wrapped_avframe (native))
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf62.1.101
  Stream #0:0: Video: wrapped_avframe, p016le(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn
    Metadata:
      encoder         : Lavc62.5.100 wrapped_avframe
[out#0/null @ 000001474eee9ec0] video:0KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown
frame=    1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.16 bitrate=N/A speed=7.97x elapsed=0:00:00.02

As you can see, the CUVID decoder is outputting P016 format.

Stream #0:0: Video: wrapped_avframe, p016le(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn
Note: See TracTickets for help on using tickets.