Opened 7 weeks ago
Last modified 4 weeks ago
#11245 new defect
Slow HEIC decoding with "hevc_cuvid"
Reported by: | dkode | Owned by: | |
---|---|---|---|
Priority: | important | Component: | avformat |
Version: | 7.1 | Keywords: | heif hevc_cuvid |
Cc: | dkode, MasterQuestionable | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by )
Summary of the bug:
How to reproduce:
Version Info: FFMPEG build with hevc_cuvid and no software decoder for hevc
ffmpeg -i ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04) configuration: --disable-decoders --disable-encoders --disable-decoder=vp9 --disable-decoder=hevc --enable-decoder=av1 --enable-decoder=cfhd --enable-decoder=dnxhd --enable-decoder=dvvideo --enable-decoder=h264 --enable-decoder=hevc_cuvid --enable-decoder=mjpeg --enable-decoder=jpeg2000 --enable-decoder=mpeg2video --enable-decoder=mpeg4 --enable-decoder=vp6 --enable-decoder=vp7 --enable-decoder=vp8 --enable-decoder=h263 --enable-decoder=dpx --enable-decoder=mjpeg --enable-decoder=mpeg1video --enable-decoder=msrle --enable-decoder=qtrle --enable-decoder=wmv1 --enable-decoder=wmv2 --enable-decoder=wmv3 --enable-decoder=msmpeg4v1 --enable-decoder=msmpeg4v2 --enable-decoder=msmpeg4v3 --enable-decoder=wmav1 --enable-decoder=wmav2 --enable-decoder=wmapro --enable-decoder=mp2 --enable-decoder=opus --enable-decoder=png --enable-decoder=mjpegb --enable-decoder=svq3 --enable-decoder=cinepak --enable-decoder=vp6f --enable-decoder=aic --enable-decoder=hqx --enable-decoder=hq_hqa --enable-decoder=flv --enable-decoder=vc1 --enable-decoder=libdav1d --enable-decoder=aac --enable-decoder=mp3 --enable-decoder=vorbis --enable-decoder=speex --enable-decoder=flac --enable-decoder=gsm --enable-decoder=mp1 --enable-decoder=alac --enable-decoder=pcm_alaw --enable-decoder=pcm_bluray --enable-decoder=pcm_dvd --enable-decoder=pcm_f16le --enable-decoder=pcm_f24le --enable-decoder=pcm_f32be --enable-decoder=pcm_f32le --enable-decoder=pcm_f64be --enable-decoder=pcm_f64le --enable-decoder=pcm_lxf --enable-decoder=pcm_mulaw --enable-decoder=pcm_s16be --enable-decoder=pcm_s16be_planar --enable-decoder=pcm_s16le --enable-decoder=pcm_s16le_planar --enable-decoder=pcm_s24be --enable-decoder=pcm_s24daud --enable-decoder=pcm_s24le --enable-decoder=pcm_s24le_planar --enable-decoder=pcm_s32be --enable-decoder=pcm_s32le --enable-decoder=pcm_s32le_planar --enable-decoder=pcm_s64be --enable-decoder=pcm_s64le --enable-decoder=pcm_s8 --enable-decoder=pcm_s8_planar --enable-decoder=pcm_sga --enable-decoder=pcm_u16le --enable-decoder=pcm_u24be --enable-decoder=pcm_u24le --enable-decoder=pcm_u32be --enable-decoder=pcm_u32le --enable-decoder=pcm_u8 --enable-decoder=pcm_vidc --enable-encoder=gif --enable-encoder=mjpeg --enable-encoder=png --enable-encoder=rawvideo --enable-encoder=h264_nvenc --enable-encoder=hevc_nvenc --enable-encoder=libx264 --enable-encoder=aac --enable-encoder=libmp3lame --enable-encoder=pcm_alaw --enable-encoder=pcm_bluray --enable-encoder=pcm_dvd --enable-encoder=pcm_f16le --enable-encoder=pcm_f24le --enable-encoder=pcm_f32be --enable-encoder=pcm_f32le --enable-encoder=pcm_f64be --enable-encoder=pcm_f64le --enable-encoder=pcm_lxf --enable-encoder=pcm_mulaw --enable-encoder=pcm_s16be --enable-encoder=pcm_s16be_planar --enable-encoder=pcm_s16le --enable-encoder=pcm_s16le_planar --enable-encoder=pcm_s24be --enable-encoder=pcm_s24daud --enable-encoder=pcm_s24le --enable-encoder=pcm_s24le_planar --enable-encoder=pcm_s32be --enable-encoder=pcm_s32le --enable-encoder=pcm_s32le_planar --enable-encoder=pcm_s64be --enable-encoder=pcm_s64le --enable-encoder=pcm_s8 --enable-encoder=pcm_s8_planar --enable-encoder=pcm_sga --enable-encoder=pcm_u16le --enable-encoder=pcm_u24be --enable-encoder=pcm_u24le --enable-encoder=pcm_u32be --enable-encoder=pcm_u32le --enable-encoder=pcm_u8 --enable-encoder=pcm_vidc --enable-cuda-nvcc --enable-libnpp --enable-nvenc --enable-cuvid --enable-zlib --enable-gpl --enable-libx264 --enable-libdav1d --enable-libmp3lame --enable-librubberband --enable-libzimg --prefix=/av-processor/dist/ffmpeg --enable-shared --disable-static --enable-pic --enable-nonfree --enable-openssl --extra-cflags='-DHAVE_THREADS=1 -DHAVE_THREADS=1 -I/usr/local/cuda/include -I/av-processor/dist/extra_libs/include -I/usr/local/include -I/av-processor/dist/extra_libs/include' --extra-ldflags=' -L/usr/local/cuda/lib64 -L/av-processor/dist/extra_libs/lib -L/usr/local/lib -L/av-processor/dist/extra_libs/lib' libavutil 59. 39.100 / 59. 39.100 libavcodec 61. 19.100 / 61. 19.100 libavformat 61. 7.100 / 61. 7.100 libavdevice 61. 3.100 / 61. 3.100 libavfilter 10. 4.100 / 10. 4.100 libswscale 8. 3.100 / 8. 3.100 libswresample 5. 3.100 / 5. 3.100 libpostproc 58. 3.100 / 58. 3.100
ffmpeg -i ~/ffmpeg_test/testfiles/LiveOff.HEIC -map 0:v:0 LiveOff_%d.png
This is grid type of heic image and ffmpeg is taking 40-60 seconds
Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.
Attachments (2)
Change History (13)
by , 7 weeks ago
Attachment: | ffmpeg_hevc.logs added |
---|
comment:1 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:2 by , 7 weeks ago
and no software decoder for hevc
What? How? Software decoder is needed to decode with hevc_cuvid anyway?
comment:3 by , 7 weeks ago
@Bailing
I don't think it is required (mandatory) and functionally the process is working fine.I have a requirement that software decoding cannot be used (software decoder explicitly disabled using disable-decoder flag while compiling).
Al thought functionally correct, But it is super slow for tile based heic images.I could see in the logs that cuda is getting opened and closed a lot of times, this may be the reason. Please find the trace logs attached above.
[AVHWDeviceContext @ 0x555d4e3fb540] Loaded lib: libcuda.so.1 587 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuInit 588 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDriverGetVersion 589 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetCount 590 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGet 591 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetAttribute 592 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetName 593 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceComputeCapability 594 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxCreate_v2 595 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxGetCurrent 596 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxSetLimit 597 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxPushCurrent_v2 598 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxPopCurrent_v2 599 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxDestroy_v2 600 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAlloc_v2 601 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAllocPitch_v2 602 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemAllocManaged 603 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemsetD8Async 604 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemFree_v2 605 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy 606 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyAsync 607 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy2D_v2 608 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpy2DAsync_v2 609 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyHtoD_v2 610 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyHtoDAsync_v2 611 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoH_v2 612 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoHAsync_v2 613 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoD_v2 614 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMemcpyDtoDAsync_v2 615 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGetErrorName 616 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGetErrorString 617 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuCtxGetDevice 618 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxRetain 619 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxRelease 620 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxSetFlags 621 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxGetState 622 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDevicePrimaryCtxReset 623 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamCreate 624 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamQuery 625 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamSynchronize 626 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamDestroy_v2 627 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamAddCallback 628 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuStreamWaitEvent 629 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventCreate 630 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventDestroy_v2 631 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventSynchronize 632 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventQuery 633 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEventRecord 634 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLaunchKernel 635 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkCreate 636 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkAddData 637 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkComplete 638 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuLinkDestroy 639 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleLoadData 640 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleUnload 641 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleGetFunction 642 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuModuleGetGlobal 643 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuTexObjectCreate 644 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuTexObjectDestroy 645 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGLGetDevices_v2 646 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsGLRegisterImage 647 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsUnregisterResource 648 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsMapResources 649 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsUnmapResources 650 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsSubResourceGetMappedArray 651 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuGraphicsResourceGetMappedPointer_v2 652 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetUuid 653 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetUuid_v2 654 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetLuid 655 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetByPCIBusId 656 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDeviceGetPCIBusId 657 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuImportExternalMemory 658 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDestroyExternalMemory 659 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuExternalMemoryGetMappedBuffer 660 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuExternalMemoryGetMappedMipmappedArray 661 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMipmappedArrayGetLevel 662 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuMipmappedArrayDestroy 663 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuImportExternalSemaphore 664 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuDestroyExternalSemaphore 665 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuSignalExternalSemaphoresAsync 666 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuWaitExternalSemaphoresAsync 667 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArrayCreate_v2 668 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArray3DCreate_v2 669 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuArrayDestroy 670 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerConnect 671 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerDisconnect 672 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamConsumerDisconnect 673 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerPresentFrame 674 [AVHWDeviceContext @ 0x555d4e3fb540] Loaded sym: cuEGLStreamProducerReturnFrame 675 [AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuInit(0) 676 [AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx) 677 [AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) 678 [AVHWDeviceContext @ 0x555d4e3fb540] Calling cu->cuCtxPopCurrent(&dummy) 679 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cudl->cuCtxPushCurrent(cuda_ctx) 680 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps8) 681 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps10) 682 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps12) 683 [hevc_cuvid @ 0x555d4e38a2c0] CUVID capabilities for hevc_cuvid: 684 [hevc_cuvid @ 0x555d4e38a2c0] 8 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192 685 [hevc_cuvid @ 0x555d4e38a2c0] 10 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192 686 [hevc_cuvid @ 0x555d4e38a2c0] 12 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192 687 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidCreateVideoParser(&ctx->cuparser, &ctx->cuparseinfo) 688 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &seq_pkt) 689 [hevc_cuvid @ 0x555d4e38a2c0] Calling ctx->cudl->cuCtxPopCurrent(&dummy) 690 [hevc_cuvid @ 0x555d4e38b580] Format nv12 chosen by get_format(). 691 [hevc_cuvid @ 0x555d4e38b580] Loaded lib: libnvcuvid.so.1 692 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetDecoderCaps 693 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateDecoder 694 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyDecoder 695 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDecodePicture 696 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetDecodeStatus 697 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidReconfigureDecoder 698 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidMapVideoFrame64 699 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidUnmapVideoFrame64 700 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLockCreate 701 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLockDestroy 702 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxLock 703 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCtxUnlock 704 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoSource 705 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoSourceW 706 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyVideoSource 707 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidSetVideoSourceState 708 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetVideoSourceState 709 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetSourceVideoFormat 710 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidGetSourceAudioFormat 711 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidCreateVideoParser 712 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidParseVideoData 713 [hevc_cuvid @ 0x555d4e38b580] Loaded sym: cuvidDestroyVideoParser
This log appears 50 times for a simple command.
comment:4 by , 7 weeks ago
The software decoder is used to preparse the files and to convert it to ANNEX B.
comment:5 by , 7 weeks ago
@Balling
I am not sure about your comment but the same build works very fast for hevc videos transcoding.
Did you find anything unusual in the attached logs ?
comment:6 by , 7 weeks ago
Cc: | added |
---|---|
Component: | undetermined → avformat |
Keywords: | hevc_cuvid added; Hevc cuvid removed |
Summary: | FFMPEG SLOW FOR HEIC TO PNG WITH HEVC_CUVID → Slow HEIC decoding with "hevc_cuvid" |
comment:7 by , 7 weeks ago
I tried setup debugging and found libavformat:avformat_find_stream_info is consuming most of the time.
comment:8 by , 6 weeks ago
This is grid type of heic image and ffmpeg is taking 40-60 seconds
I tried setup debugging and found libavformat/demux.c:avformat_find_stream_info is consuming most of the time.
This grid based image has 49 separate HEVC streams, 48 of them being the actual grid and one being a thumbnail, and each of which will fire their own decoder. The slowness may be hevc_cuvid being initialized that many times just for probing.
comment:9 by , 6 weeks ago
I do not even think we support proper CUVID init to do it. https://trac.ffmpeg.org/ticket/6989
It is supposed to use h26x decoder to preparse data.
comment:10 by , 6 weeks ago
@James
Yes, this is infact the case, new decoder for each grid. But i am not sure whether it is possible to initialize it once, then reuse for all the packets.
I think same is done for videos.
comment:11 by , 4 weeks ago
Apparently there is a mistake in CUVID init as opposed to Nvidia decoder guidelines.
https://forums.developer.nvidia.com/t/expected-performance-gain/311816
"Another difference while using the FFMPEG approach is that ‘cuvidParseVideoData’ and ‘cuvidMapVideoFrame’ are executed in the same thread (opposite to what is recommended in the documentation, since ‘cuvidMapVideoFrame’ will block the execution)."
trace logs