Opened 2 months ago
Last modified 2 months ago
#11273 new defect
"tpad" speed regression (~ 40x) caused by threading change?
Reported by: | Alex | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avfilter |
Version: | unspecified | Keywords: | tpad filter_complex |
Cc: | MasterQuestionable | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
There are three ways to loop a static image: -loop, -stream_loop, and the tpad filter, which can repeat the last frame forever. In 2023, tpad was the fastest option. However, now the same command works 40 times slower. It seems that adding tpad triggers the entire processing chain, whereas in the 2023 builds, frames were treated as duplicates without any processing.
How to reproduce:
ffmpeg -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]^
scale=w=if(gt(iw*ih\,1920*1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,^
pad=ceil(iw/2)*2:ceil(ih/2)*2,^
tpad=stop_mode=clone:stop_duration=3600100[v]" ^
-filter_complex "[1:a]anull[a]" -map "[v]" -map "[a]" ^
-c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p ^
-benchmark -report bench2024.mp4
ffmpeg -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]scale=w=if(gt(iw*ih\,1920*1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]" -filter_complex "[1:a]anull[a]" -map "[v]" -map "[a]" -c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p -benchmark -report bench2024.mp4 ffmpeg started on 2024-10-30 at 11:36:51 Report written to "ffmpeg-20241030-113651.log" Log level: 48 ffmpeg version 2024-10-27-git-bb57b78013-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers built with gcc 14.2.0 (Rev1, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --en able-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enabl e-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --ena ble-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc - -enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable- libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-libr ubberband libavutil 59. 46.100 / 59. 46.100 libavcodec 61. 22.100 / 61. 22.100 libavformat 61. 9.100 / 61. 9.100 libavdevice 61. 4.100 / 61. 4.100 libavfilter 10. 6.100 / 10. 6.100 libswscale 8. 9.101 / 8. 9.101 libswresample 5. 4.100 / 5. 4.100 libpostproc 58. 4.100 / 58. 4.100 Input #0, png_pipe, from '3.png': Duration: N/A, bitrate: N/A Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1920x1440 [SAR 2834:2834 DAR 4:3], 25 fps, 25 tbr, 25 tbn [aist#1:0/pcm_s16le @ 00000000004e8e40] Guessed Channel Layout: stereo Input #1, wav, from '1.wav': Duration: 00:00:21.69, bitrate: 1413 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Stream mapping: Stream #0:0 (png) -> scale:default (graph 0) Stream #1:0 (pcm_s16le) -> anull:default (graph 1) tpad:default (graph 0) -> Stream #0:0 (libx264) anull:default (graph 1) -> Stream #0:1 (pcm_s16le) Press [q] to stop, [?] for help [libx264 @ 0000000000530500] using SAR=1/1 [libx264 @ 0000000000530500] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX [libx264 @ 0000000000530500] profile High, level 5.0, 4:2:0, 8-bit [libx264 @ 0000000000530500] 264 - core 164 r3198 da14df5 - H.264/MPEG-4 AVC codec - Copyleft 2003-2024 - http://www.videolan .org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_ range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 di rect=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf =23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'bench2024.mp4': Metadata: encoder : Lavf61.9.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 1920x1440 [SAR 1:1 DAR 4:3], q=2-31, 0.30 fps, 1228 8 tbn Metadata: encoder : Lavc61.22.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: pcm_s16le (ipcm / 0x6D637069), 44100 Hz, stereo, s16, 1411 kb/s Metadata: encoder : Lavc61.22.100 pcm_s16le [out#0/mp4 @ 0000000000579440] video:905KiB audio:3736KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhea d: 0.075687% frame= 151 fps= 50 q=-1.0 Lsize= 4645KiB time=00:00:21.68 bitrate=1754.5kbits/s speed=7.24x bench: utime=18.408s stime=0.234s rtime=2.999s bench: maxrss=969164KiB [libx264 @ 0000000000530500] frame I:1 Avg QP: 4.59 size:832750 [libx264 @ 0000000000530500] frame P:38 Avg QP: 5.62 size: 2129 [libx264 @ 0000000000530500] frame B:112 Avg QP:11.34 size: 114 [libx264 @ 0000000000530500] consecutive B-frames: 0.7% 1.3% 0.0% 98.0% [libx264 @ 0000000000530500] mb I I16..4: 11.3% 45.9% 42.8% [libx264 @ 0000000000530500] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 5.6% 0.0% 0.1% 0.0% 0.0% skip:94.3% [libx264 @ 0000000000530500] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.4% 0.0% 0.0% direct: 0.0% skip:99.6% L0: 1.0% L 1:99.0% BI: 0.0% [libx264 @ 0000000000530500] 8x8 transform intra:45.9% inter:85.7% [libx264 @ 0000000000530500] coded y,uvDC,uvAC intra: 100.0% 99.7% 99.6% inter: 0.5% 0.1% 0.1% [libx264 @ 0000000000530500] i16 v,h,dc,p: 14% 22% 32% 32% [libx264 @ 0000000000530500] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 15% 10% 6% 8% 9% 7% 14% 10% [libx264 @ 0000000000530500] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 20% 9% 6% 9% 8% 7% 8% 7% [libx264 @ 0000000000530500] i8c dc,h,v,p: 39% 17% 22% 22% [libx264 @ 0000000000530500] Weighted P-Frames: Y:0.0% UV:0.0% [libx264 @ 0000000000530500] ref P L0: 99.4% 0.0% 0.4% 0.2% [libx264 @ 0000000000530500] ref B L0: 78.4% 21.6% [libx264 @ 0000000000530500] kb/s:14.72
ffmpeg2023 -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]scale=w=if(gt(iw*ih\,1920* 1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]" -filte r_complex "[1:a]anull[a]" -map "[v]" -map "[a]" -c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p -benchmark -report bench2023.mp4 ffmpeg started on 2024-10-30 at 11:52:58 Report written to "ffmpeg-20241030-115258.log" Log level: 48 ffmpeg version 2023-04-17-git-65e537b833-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers built with gcc 12.2.0 (Rev10, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --en able-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enabl e-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ff nvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libgme --enable-libopenmpt --e nable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-a mrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband libavutil 58. 6.100 / 58. 6.100 libavcodec 60. 9.100 / 60. 9.100 libavformat 60. 4.101 / 60. 4.101 libavdevice 60. 2.100 / 60. 2.100 libavfilter 9. 5.100 / 9. 5.100 libswscale 7. 2.100 / 7. 2.100 libswresample 4. 11.100 / 4. 11.100 libpostproc 57. 2.100 / 57. 2.100 Input #0, png_pipe, from '3.png': Duration: N/A, bitrate: N/A Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1920x1440 [SAR 2834:2834 DAR 4:3], 25 fps, 25 tbr, 25 tbn [aist#1:0/pcm_s16le @ 00000000005e8d00] Guessed Channel Layout: stereo Input #1, wav, from '1.wav': Duration: 00:00:21.69, bitrate: 1413 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s Stream mapping: Stream #0:0 (png) -> scale:default (graph 0) Stream #1:0 (pcm_s16le) -> anull:default (graph 1) tpad:default (graph 0) -> Stream #0:0 (libx264) anull:default (graph 1) -> Stream #0:1 (pcm_s16le) Press [q] to stop, [?] for help [libx264 @ 00000000005f42c0] using SAR=1/1 [libx264 @ 00000000005f42c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX [libx264 @ 00000000005f42c0] profile High, level 5.0, 4:2:0, 8-bit [libx264 @ 00000000005f42c0] 264 - core 164 r3106 eaa68fa - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan .org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_ range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 di rect=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf =23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'bench2023.mp4': Metadata: encoder : Lavf60.4.101 Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 1920x1440 [SAR 1:1 DAR 4:3], q=2-31, 0.30 fps, 1228 8 tbn Metadata: encoder : Lavc60.9.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: pcm_s16le (ipcm / 0x6D637069), 44100 Hz, stereo, s16, 1411 kb/s Metadata: encoder : Lavc60.9.100 pcm_s16le [out#0/mp4 @ 0000000000529700] video:454kB audio:3736kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0. 034613% frame= 6 fps=0.0 q=-1.0 Lsize= 4191kB time=00:00:21.66 bitrate=1584.8kbits/s speed=65.5x bench: utime=0.515s stime=0.109s rtime=0.334s bench: maxrss=433352kB [libx264 @ 00000000005f42c0] frame I:1 Avg QP: 9.83 size:462555 [libx264 @ 00000000005f42c0] frame P:2 Avg QP:14.57 size: 424 [libx264 @ 00000000005f42c0] frame B:3 Avg QP:12.68 size: 264 [libx264 @ 00000000005f42c0] consecutive B-frames: 33.3% 0.0% 0.0% 66.7% [libx264 @ 00000000005f42c0] mb I I16..4: 4.6% 82.7% 12.7% [libx264 @ 00000000005f42c0] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 1.3% 0.1% 0.0% 0.0% 0.0% skip:98.5% [libx264 @ 00000000005f42c0] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.9% 0.0% 0.1% direct: 0.2% skip:98.8% L0:17.7% L 1:82.3% BI: 0.0% [libx264 @ 00000000005f42c0] 8x8 transform intra:82.7% inter:100.0% [libx264 @ 00000000005f42c0] coded y,uvDC,uvAC intra: 99.8% 98.9% 96.6% inter: 0.0% 0.3% 0.0% [libx264 @ 00000000005f42c0] i16 v,h,dc,p: 7% 18% 18% 57% [libx264 @ 00000000005f42c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 16% 13% 6% 7% 8% 7% 12% 9% [libx264 @ 00000000005f42c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 18% 10% 7% 11% 9% 7% 8% 6% [libx264 @ 00000000005f42c0] i8c dc,h,v,p: 38% 18% 21% 23% [libx264 @ 00000000005f42c0] Weighted P-Frames: Y:0.0% UV:0.0% [libx264 @ 00000000005f42c0] ref P L0: 97.5% 0.0% 2.5% [libx264 @ 00000000005f42c0] ref B L1: 99.7% 0.3% [libx264 @ 00000000005f42c0] kb/s:185.68
ffmpeg 2024: 18.5 s and 100% CPU
ffmpeg 2023: 0.5 s and 5% CPU
Attachments (3)
Change History (9)
by , 2 months ago
by , 2 months ago
by , 2 months ago
comment:2 by , 2 months ago
Versions tested:
N-100906-gad2cc0e2f4-2021-02-31 is okay BtbN N-111869-g7aa71ab5c0-2023-08-31 is okay BtbN N-112565-g55f28eb627-2023-10-31 is okay BtbN N-112876-ga30adf9f96-2023-11-30 is okay 2023-12-11-git-1439784ff0-essentials_build-www.gyan.dev is okay 6.1.1-full_build-www.gyan.dev (marked as 2023-12-31) is also okay
2023-12-14-git-5256b2fbe6-essentials_build-www.gyan.dev buggy 2023-12-23-git-f5f414d9c4-essentials_build-www.gyan.dev buggy BtbN N-113150-ge063c1d079-2023-12-31 buggy BtbN N-114545-g199c479b9a-2024-03-31 buggy 7.0.2-essentials_build-www.gyan.dev buggy 2024-10-27-git-bb57b78013-essentials_build-www-gyan-dev buggy
So the blame commit happened in last days of December 2023.
Well, seems New Year’s Eve commits are even more disastrous than Friday afternoon ones..
comment:3 by , 2 months ago
Cc: | added |
---|---|
Component: | undetermined → avfilter |
Keywords: | filtering removed |
Summary: | Processing is x40 times slower in latest builds (tpad filter) → "tpad" speed regression (40x?) |
͏ Would you further guess which commit is the cause..?
͏ https://github.com/FFmpeg/FFmpeg/commits?since=2023-12-12&until=2023-12-12
comment:4 by , 2 months ago
The offending commit is likely d119ae2fd82a494d9430ff4d4fc262961a68c598 or from the few just before it.
You can find my proximate builds for triangulating on this page: https://github.com/GyanD/codexffmpeg/releases?page=11
comment:5 by , 2 months ago
Thanks, Gyan. It is indeed between 2023-12-11-git-1439784ff0 (render speed = x200) and 2023-12-14-git-5256b2fbe6 (render speed = x6)
comment:6 by , 2 months ago
Summary: | "tpad" speed regression (40x?) → "tpad" speed regression (~ 40x) caused by threading change? |
---|
You might wonder, why the second complex audio filter is there. Without it ffmpeg is unable to detect the -shortest media and simply hangs:
ffmpeg -y -i 1.wav -r 2 -i 3.png -c:a libopus -b:a 64K -c:v libx264 -pix_fmt yuv420p -preset faster -x264-params "rc-lookahead=1:min-keyint=29:ref=0:subme=0:bframes=1:qp=20" -filter_complex "[1:v]pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]; [0:a]anull[a]" -map "[v]" -map "[a]" -shortest -benchmark -report hang.mp4
Adding this second filter is a workaround suggested by mrskman in #3789 four years ago. Again, version 2023 renders the code above without any issues.