Opened 6 weeks ago

Last modified 3 weeks ago

#11245 new defect

Slow HEIC decoding with "hevc_cuvid"

Reported by: dkode Owned by:
Priority: important Component: avformat
Version: 7.1 Keywords: heif hevc_cuvid
Cc: dkode, MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by dkode)

Summary of the bug:
How to reproduce:

Version Info: FFMPEG build with hevc_cuvid and no software decoder for hevc

ffmpeg -i
ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
  configuration: --disable-decoders --disable-encoders --disable-decoder=vp9 --disable-decoder=hevc --enable-decoder=av1 --enable-decoder=cfhd --enable-decoder=dnxhd --enable-decoder=dvvideo --enable-decoder=h264 --enable-decoder=hevc_cuvid --enable-decoder=mjpeg --enable-decoder=jpeg2000 --enable-decoder=mpeg2video --enable-decoder=mpeg4 --enable-decoder=vp6 --enable-decoder=vp7 --enable-decoder=vp8 --enable-decoder=h263 --enable-decoder=dpx --enable-decoder=mjpeg --enable-decoder=mpeg1video --enable-decoder=msrle --enable-decoder=qtrle --enable-decoder=wmv1 --enable-decoder=wmv2 --enable-decoder=wmv3 --enable-decoder=msmpeg4v1 --enable-decoder=msmpeg4v2 --enable-decoder=msmpeg4v3 --enable-decoder=wmav1 --enable-decoder=wmav2 --enable-decoder=wmapro --enable-decoder=mp2 --enable-decoder=opus --enable-decoder=png --enable-decoder=mjpegb --enable-decoder=svq3 --enable-decoder=cinepak --enable-decoder=vp6f --enable-decoder=aic --enable-decoder=hqx --enable-decoder=hq_hqa --enable-decoder=flv --enable-decoder=vc1 --enable-decoder=libdav1d --enable-decoder=aac --enable-decoder=mp3 --enable-decoder=vorbis --enable-decoder=speex --enable-decoder=flac --enable-decoder=gsm --enable-decoder=mp1 --enable-decoder=alac --enable-decoder=pcm_alaw --enable-decoder=pcm_bluray --enable-decoder=pcm_dvd --enable-decoder=pcm_f16le --enable-decoder=pcm_f24le --enable-decoder=pcm_f32be --enable-decoder=pcm_f32le --enable-decoder=pcm_f64be --enable-decoder=pcm_f64le --enable-decoder=pcm_lxf --enable-decoder=pcm_mulaw --enable-decoder=pcm_s16be --enable-decoder=pcm_s16be_planar --enable-decoder=pcm_s16le --enable-decoder=pcm_s16le_planar --enable-decoder=pcm_s24be --enable-decoder=pcm_s24daud --enable-decoder=pcm_s24le --enable-decoder=pcm_s24le_planar --enable-decoder=pcm_s32be --enable-decoder=pcm_s32le --enable-decoder=pcm_s32le_planar --enable-decoder=pcm_s64be --enable-decoder=pcm_s64le --enable-decoder=pcm_s8 --enable-decoder=pcm_s8_planar --enable-decoder=pcm_sga --enable-decoder=pcm_u16le --enable-decoder=pcm_u24be --enable-decoder=pcm_u24le --enable-decoder=pcm_u32be --enable-decoder=pcm_u32le --enable-decoder=pcm_u8 --enable-decoder=pcm_vidc --enable-encoder=gif --enable-encoder=mjpeg --enable-encoder=png --enable-encoder=rawvideo --enable-encoder=h264_nvenc --enable-encoder=hevc_nvenc --enable-encoder=libx264 --enable-encoder=aac --enable-encoder=libmp3lame --enable-encoder=pcm_alaw --enable-encoder=pcm_bluray --enable-encoder=pcm_dvd --enable-encoder=pcm_f16le --enable-encoder=pcm_f24le --enable-encoder=pcm_f32be --enable-encoder=pcm_f32le --enable-encoder=pcm_f64be --enable-encoder=pcm_f64le --enable-encoder=pcm_lxf --enable-encoder=pcm_mulaw --enable-encoder=pcm_s16be --enable-encoder=pcm_s16be_planar --enable-encoder=pcm_s16le --enable-encoder=pcm_s16le_planar --enable-encoder=pcm_s24be --enable-encoder=pcm_s24daud --enable-encoder=pcm_s24le --enable-encoder=pcm_s24le_planar --enable-encoder=pcm_s32be --enable-encoder=pcm_s32le --enable-encoder=pcm_s32le_planar --enable-encoder=pcm_s64be --enable-encoder=pcm_s64le --enable-encoder=pcm_s8 --enable-encoder=pcm_s8_planar --enable-encoder=pcm_sga --enable-encoder=pcm_u16le --enable-encoder=pcm_u24be --enable-encoder=pcm_u24le --enable-encoder=pcm_u32be --enable-encoder=pcm_u32le --enable-encoder=pcm_u8 --enable-encoder=pcm_vidc --enable-cuda-nvcc --enable-libnpp --enable-nvenc --enable-cuvid --enable-zlib --enable-gpl --enable-libx264 --enable-libdav1d --enable-libmp3lame --enable-librubberband --enable-libzimg --prefix=/av-processor/dist/ffmpeg --enable-shared --disable-static --enable-pic --enable-nonfree --enable-openssl --extra-cflags='-DHAVE_THREADS=1 -DHAVE_THREADS=1 -I/usr/local/cuda/include -I/av-processor/dist/extra_libs/include -I/usr/local/include -I/av-processor/dist/extra_libs/include' --extra-ldflags=' -L/usr/local/cuda/lib64 -L/av-processor/dist/extra_libs/lib -L/usr/local/lib -L/av-processor/dist/extra_libs/lib'
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
ffmpeg  -i ~/ffmpeg_test/testfiles/LiveOff.HEIC -map 0:v:0 LiveOff_%d.png 

This is grid type of heic image and ffmpeg is taking 40-60 seconds

Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.

Attachments (2)

ffmpeg_hevc.logs (598.3 KB ) - added by dkode 6 weeks ago.
trace logs
LiveOff.HEIC (980.4 KB ) - added by dkode 6 weeks ago.
image file

Download all attachments as: .zip

Change History (13)

by dkode, 6 weeks ago

Attachment: ffmpeg_hevc.logs added

trace logs

by dkode, 6 weeks ago

Attachment: LiveOff.HEIC added

image file

comment:1 by dkode, 6 weeks ago

Description: modified (diff)

comment:2 by Balling, 6 weeks ago

and no software decoder for hevc

What? How? Software decoder is needed to decode with hevc_cuvid anyway?

comment:3 by dkode, 6 weeks ago

@Bailing

I don't think it is required (mandatory) and functionally the process is working fine.I have a requirement that software decoding cannot be used (software decoder explicitly disabled using disable-decoder flag while compiling).

Al thought functionally correct, But it is super slow for tile based heic images.I could see in the logs that cuda is getting opened and closed a lot of times, this may be the reason. Please find the trace logs attached above.

[AVHWDeviceContext @ 0x555d4e3fb540] Loaded lib: libcuda.so.1
587	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuInit
588	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDriverGetVersion
589	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetCount
590	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGet
591	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetAttribute
592	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetName
593	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceComputeCapability
594	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxCreate_v2
595	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxGetCurrent
596	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxSetLimit
597	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxPushCurrent_v2
598	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxPopCurrent_v2
599	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxDestroy_v2
600	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAlloc_v2
601	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAllocPitch_v2
602	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAllocManaged
603	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemsetD8Async
604	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemFree_v2
605	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy
606	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyAsync
607	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy2D_v2
608	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy2DAsync_v2
609	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyHtoD_v2
610	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyHtoDAsync_v2
611	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoH_v2
612	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoHAsync_v2
613	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoD_v2
614	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoDAsync_v2
615	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGetErrorName
616	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGetErrorString
617	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxGetDevice
618	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxRetain
619	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxRelease
620	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxSetFlags
621	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxGetState
622	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxReset
623	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamCreate
624	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamQuery
625	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamSynchronize
626	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamDestroy_v2
627	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamAddCallback
628	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamWaitEvent
629	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventCreate
630	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventDestroy_v2
631	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventSynchronize
632	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventQuery
633	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventRecord
634	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLaunchKernel
635	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkCreate
636	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkAddData
637	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkComplete
638	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkDestroy
639	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleLoadData
640	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleUnload
641	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleGetFunction
642	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleGetGlobal
643	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuTexObjectCreate
644	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuTexObjectDestroy
645	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGLGetDevices_v2
646	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsGLRegisterImage
647	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsUnregisterResource
648	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsMapResources
649	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsUnmapResources
650	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsSubResourceGetMappedArray
651	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsResourceGetMappedPointer_v2
652	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetUuid
653	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetUuid_v2
654	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetLuid
655	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetByPCIBusId
656	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetPCIBusId
657	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuImportExternalMemory
658	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDestroyExternalMemory
659	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuExternalMemoryGetMappedBuffer
660	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
661	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMipmappedArrayGetLevel
662	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMipmappedArrayDestroy
663	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuImportExternalSemaphore
664	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDestroyExternalSemaphore
665	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuSignalExternalSemaphoresAsync
666	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuWaitExternalSemaphoresAsync
667	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArrayCreate_v2
668	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArray3DCreate_v2
669	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArrayDestroy
670	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerConnect
671	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerDisconnect
672	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamConsumerDisconnect
673	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerPresentFrame
674	[AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerReturnFrame
675	[AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuInit(0)
676	[AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx)
677	[AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device)
678	[AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuCtxPopCurrent(&dummy)
679	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cudl->cuCtxPushCurrent(cuda_ctx)
680	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps8)
681	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps10)
682	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps12)
683	[hevc_cuvid @ 0x555d4e38a2c0] CUVID capabilities for hevc_cuvid:
684	[hevc_cuvid @ 0x555d4e38a2c0] 8 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
685	[hevc_cuvid @ 0x555d4e38a2c0] 10 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
686	[hevc_cuvid @ 0x555d4e38a2c0] 12 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
687	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidCreateVideoParser(&ctx->cuparser, &ctx->cuparseinfo)
688	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &seq_pkt)
689	[hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cudl->cuCtxPopCurrent(&dummy)
690	[hevc_cuvid @ 0x555d4e38b580] Format nv12 chosen by get_format().
691	[hevc_cuvid @ 0x555d4e38b580] Loaded lib: libnvcuvid.so.1
692	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetDecoderCaps
693	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateDecoder
694	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyDecoder
695	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDecodePicture
696	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetDecodeStatus
697	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidReconfigureDecoder
698	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidMapVideoFrame64
699	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidUnmapVideoFrame64
700	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLockCreate
701	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLockDestroy
702	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLock
703	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxUnlock
704	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoSource
705	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoSourceW
706	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyVideoSource
707	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidSetVideoSourceState
708	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetVideoSourceState
709	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetSourceVideoFormat
710	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetSourceAudioFormat
711	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoParser
712	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidParseVideoData
713	[hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyVideoParser

This log appears 50 times for a simple command.

Last edited 6 weeks ago by dkode (previous) (diff)

comment:4 by Balling, 6 weeks ago

The software decoder is used to preparse the files and to convert it to ANNEX B.

comment:5 by dkode, 6 weeks ago

@Balling

I am not sure about your comment but the same build works very fast for hevc videos transcoding.

Did you find anything unusual in the attached logs ?

comment:6 by MasterQuestionable, 6 weeks ago

Cc: MasterQuestionable added
Component: undeterminedavformat
Keywords: hevc_cuvid added; Hevc cuvid removed
Summary: FFMPEG SLOW FOR HEIC TO PNG WITH HEVC_CUVIDSlow HEIC decoding with "hevc_cuvid"

comment:7 by dkode, 6 weeks ago

I tried setup debugging and found libavformat/demux.c:avformat_find_stream_info is consuming most of the time.

Last edited 6 weeks ago by dkode (previous) (diff)

comment:8 by James, 5 weeks ago

This is grid type of heic image and ffmpeg is taking 40-60 seconds

I tried setup debugging and found libavformat/demux.c:avformat_find_stream_info is consuming most of the time.

This grid based image has 49 separate HEVC streams, 48 of them being the actual grid and one being a thumbnail, and each of which will fire their own decoder. The slowness may be hevc_cuvid being initialized that many times just for probing.

comment:9 by Balling, 5 weeks ago

I do not even think we support proper CUVID init to do it. https://trac.ffmpeg.org/ticket/6989

It is supposed to use h26x decoder to preparse data.

comment:10 by dkode, 5 weeks ago

@James
Yes, this is infact the case, new decoder for each grid. But i am not sure whether it is possible to initialize it once, then reuse for all the packets.
I think same is done for videos.

comment:11 by Balling, 3 weeks ago

Apparently there is a mistake in CUVID init as opposed to Nvidia decoder guidelines.

https://forums.developer.nvidia.com/t/expected-performance-gain/311816

"Another difference while using the FFMPEG approach is that ‘cuvidParseVideoData’ and ‘cuvidMapVideoFrame’ are executed in the same thread (opposite to what is recommended in the documentation, since ‘cuvidMapVideoFrame’ will block the execution)."

Note: See TracTickets for help on using tickets.