Opened 2 months ago

Last modified 2 months ago

#11273 new defect

"tpad" speed regression (~ 40x) caused by threading change?

Reported by: Alex Owned by:
Priority: normal Component: avfilter
Version: unspecified Keywords: tpad filter_complex
Cc: MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

There are three ways to loop a static image: -loop, -stream_loop, and the tpad filter, which can repeat the last frame forever. In 2023, tpad was the fastest option. However, now the same command works 40 times slower. It seems that adding tpad triggers the entire processing chain, whereas in the 2023 builds, frames were treated as duplicates without any processing.

How to reproduce:

ffmpeg -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]^
scale=w=if(gt(iw*ih\,1920*1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,^
pad=ceil(iw/2)*2:ceil(ih/2)*2,^
tpad=stop_mode=clone:stop_duration=3600100[v]" ^
-filter_complex "[1:a]anull[a]" -map "[v]" -map "[a]" ^
-c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p ^
-benchmark -report bench2024.mp4

ffmpeg -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]scale=w=if(gt(iw*ih\,1920*1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]" -filter_complex "[1:a]anull[a]" -map "[v]" -map "[a]" -c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p -benchmark -report bench2024.mp4

ffmpeg started on 2024-10-30 at 11:36:51
Report written to "ffmpeg-20241030-113651.log"
Log level: 48
ffmpeg version 2024-10-27-git-bb57b78013-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig
 --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --en
able-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv
id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enabl
e-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --ena
ble-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc -
-enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-
libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-libr
ubberband
  libavutil      59. 46.100 / 59. 46.100
  libavcodec     61. 22.100 / 61. 22.100
  libavformat    61.  9.100 / 61.  9.100
  libavdevice    61.  4.100 / 61.  4.100
  libavfilter    10.  6.100 / 10.  6.100
  libswscale      8.  9.101 /  8.  9.101
  libswresample   5.  4.100 /  5.  4.100
  libpostproc    58.  4.100 / 58.  4.100
Input #0, png_pipe, from '3.png':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1920x1440 [SAR 2834:2834 DAR 4:3], 25 fps, 25 tbr, 25 tbn
[aist#1:0/pcm_s16le @ 00000000004e8e40] Guessed Channel Layout: stereo
Input #1, wav, from '1.wav':
  Duration: 00:00:21.69, bitrate: 1413 kb/s
  Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 (png) -> scale:default (graph 0)
  Stream #1:0 (pcm_s16le) -> anull:default (graph 1)
  tpad:default (graph 0) -> Stream #0:0 (libx264)
  anull:default (graph 1) -> Stream #0:1 (pcm_s16le)
Press [q] to stop, [?] for help
[libx264 @ 0000000000530500] using SAR=1/1
[libx264 @ 0000000000530500] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0000000000530500] profile High, level 5.0, 4:2:0, 8-bit
[libx264 @ 0000000000530500] 264 - core 164 r3198 da14df5 - H.264/MPEG-4 AVC codec - Copyleft 2003-2024 - http://www.videolan
.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_
range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2
sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 di
rect=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf
=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'bench2024.mp4':
  Metadata:
    encoder         : Lavf61.9.100
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 1920x1440 [SAR 1:1 DAR 4:3], q=2-31, 0.30 fps, 1228
8 tbn
    Metadata:
      encoder         : Lavc61.22.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1: Audio: pcm_s16le (ipcm / 0x6D637069), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc61.22.100 pcm_s16le
[out#0/mp4 @ 0000000000579440] video:905KiB audio:3736KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhea
d: 0.075687%
frame=  151 fps= 50 q=-1.0 Lsize=    4645KiB time=00:00:21.68 bitrate=1754.5kbits/s speed=7.24x
bench: utime=18.408s stime=0.234s rtime=2.999s
bench: maxrss=969164KiB
[libx264 @ 0000000000530500] frame I:1     Avg QP: 4.59  size:832750
[libx264 @ 0000000000530500] frame P:38    Avg QP: 5.62  size:  2129
[libx264 @ 0000000000530500] frame B:112   Avg QP:11.34  size:   114
[libx264 @ 0000000000530500] consecutive B-frames:  0.7%  1.3%  0.0% 98.0%
[libx264 @ 0000000000530500] mb I  I16..4: 11.3% 45.9% 42.8%
[libx264 @ 0000000000530500] mb P  I16..4:  0.0%  0.0%  0.0%  P16..4:  5.6%  0.0%  0.1%  0.0%  0.0%    skip:94.3%
[libx264 @ 0000000000530500] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  0.4%  0.0%  0.0%  direct: 0.0%  skip:99.6%  L0: 1.0% L
1:99.0% BI: 0.0%
[libx264 @ 0000000000530500] 8x8 transform intra:45.9% inter:85.7%
[libx264 @ 0000000000530500] coded y,uvDC,uvAC intra: 100.0% 99.7% 99.6% inter: 0.5% 0.1% 0.1%
[libx264 @ 0000000000530500] i16 v,h,dc,p: 14% 22% 32% 32%
[libx264 @ 0000000000530500] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 15% 10%  6%  8%  9%  7% 14% 10%
[libx264 @ 0000000000530500] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 20%  9%  6%  9%  8%  7%  8%  7%
[libx264 @ 0000000000530500] i8c dc,h,v,p: 39% 17% 22% 22%
[libx264 @ 0000000000530500] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0000000000530500] ref P L0: 99.4%  0.0%  0.4%  0.2%
[libx264 @ 0000000000530500] ref B L0: 78.4% 21.6%
[libx264 @ 0000000000530500] kb/s:14.72

ffmpeg2023 -y -r 0.3 -i 3.png -i 1.wav -shortest -filter_complex "[0:v]scale=w=if(gt(iw*ih\,1920*
1080)\,iw*min(1920/iw\,1920/ih)\,iw):h=-2,pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]" -filte
r_complex "[1:a]anull[a]" -map "[v]" -map "[a]" -c:a pcm_s16le -c:v libx264 -pix_fmt yuv420p -benchmark -report bench2023.mp4

ffmpeg started on 2024-10-30 at 11:52:58
Report written to "ffmpeg-20241030-115258.log"
Log level: 48
ffmpeg version 2023-04-17-git-65e537b833-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig
 --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --en
able-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv
id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enabl
e-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ff
nvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libgme --enable-libopenmpt --e
nable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-a
mrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60.  9.100 / 60.  9.100
  libavformat    60.  4.101 / 60.  4.101
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  5.100 /  9.  5.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, png_pipe, from '3.png':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1920x1440 [SAR 2834:2834 DAR 4:3], 25 fps, 25 tbr, 25 tbn
[aist#1:0/pcm_s16le @ 00000000005e8d00] Guessed Channel Layout: stereo
Input #1, wav, from '1.wav':
  Duration: 00:00:21.69, bitrate: 1413 kb/s
  Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 (png) -> scale:default (graph 0)
  Stream #1:0 (pcm_s16le) -> anull:default (graph 1)
  tpad:default (graph 0) -> Stream #0:0 (libx264)
  anull:default (graph 1) -> Stream #0:1 (pcm_s16le)
Press [q] to stop, [?] for help
[libx264 @ 00000000005f42c0] using SAR=1/1
[libx264 @ 00000000005f42c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 00000000005f42c0] profile High, level 5.0, 4:2:0, 8-bit
[libx264 @ 00000000005f42c0] 264 - core 164 r3106 eaa68fa - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan
.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_
range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2
sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 di
rect=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf
=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'bench2023.mp4':
  Metadata:
    encoder         : Lavf60.4.101
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 1920x1440 [SAR 1:1 DAR 4:3], q=2-31, 0.30 fps, 1228
8 tbn
    Metadata:
      encoder         : Lavc60.9.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1: Audio: pcm_s16le (ipcm / 0x6D637069), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc60.9.100 pcm_s16le
[out#0/mp4 @ 0000000000529700] video:454kB audio:3736kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.
034613%
frame=    6 fps=0.0 q=-1.0 Lsize=    4191kB time=00:00:21.66 bitrate=1584.8kbits/s speed=65.5x
bench: utime=0.515s stime=0.109s rtime=0.334s
bench: maxrss=433352kB
[libx264 @ 00000000005f42c0] frame I:1     Avg QP: 9.83  size:462555
[libx264 @ 00000000005f42c0] frame P:2     Avg QP:14.57  size:   424
[libx264 @ 00000000005f42c0] frame B:3     Avg QP:12.68  size:   264
[libx264 @ 00000000005f42c0] consecutive B-frames: 33.3%  0.0%  0.0% 66.7%
[libx264 @ 00000000005f42c0] mb I  I16..4:  4.6% 82.7% 12.7%
[libx264 @ 00000000005f42c0] mb P  I16..4:  0.0%  0.0%  0.0%  P16..4:  1.3%  0.1%  0.0%  0.0%  0.0%    skip:98.5%
[libx264 @ 00000000005f42c0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  0.9%  0.0%  0.1%  direct: 0.2%  skip:98.8%  L0:17.7% L
1:82.3% BI: 0.0%
[libx264 @ 00000000005f42c0] 8x8 transform intra:82.7% inter:100.0%
[libx264 @ 00000000005f42c0] coded y,uvDC,uvAC intra: 99.8% 98.9% 96.6% inter: 0.0% 0.3% 0.0%
[libx264 @ 00000000005f42c0] i16 v,h,dc,p:  7% 18% 18% 57%
[libx264 @ 00000000005f42c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 16% 13%  6%  7%  8%  7% 12%  9%
[libx264 @ 00000000005f42c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 18% 10%  7% 11%  9%  7%  8%  6%
[libx264 @ 00000000005f42c0] i8c dc,h,v,p: 38% 18% 21% 23%
[libx264 @ 00000000005f42c0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 00000000005f42c0] ref P L0: 97.5%  0.0%  2.5%
[libx264 @ 00000000005f42c0] ref B L1: 99.7%  0.3%
[libx264 @ 00000000005f42c0] kb/s:185.68

ffmpeg 2024: 18.5 s and 100% CPU
ffmpeg 2023: 0.5 s and 5% CPU

Attachments (3)

logs.zip (17.1 KB ) - added by Alex 2 months ago.
2023.log (82.1 KB ) - added by Alex 2 months ago.
2024.log (112.3 KB ) - added by Alex 2 months ago.

Download all attachments as: .zip

Change History (9)

by Alex, 2 months ago

Attachment: logs.zip added

by Alex, 2 months ago

Attachment: 2023.log added

by Alex, 2 months ago

Attachment: 2024.log added

comment:1 by Alex, 2 months ago

You might wonder, why the second complex audio filter is there. Without it ffmpeg is unable to detect the -shortest media and simply hangs:

ffmpeg -y -i 1.wav -r 2 -i 3.png -c:a libopus -b:a 64K -c:v libx264 -pix_fmt yuv420p -preset faster -x264-params "rc-lookahead=1:min-keyint=29:ref=0:subme=0:bframes=1:qp=20" -filter_complex "[1:v]pad=ceil(iw/2)*2:ceil(ih/2)*2,tpad=stop_mode=clone:stop_duration=3600100[v]; [0:a]anull[a]" -map "[v]" -map "[a]" -shortest -benchmark -report hang.mp4

Adding this second filter is a workaround suggested by mrskman in #3789 four years ago. Again, version 2023 renders the code above without any issues.

Last edited 2 months ago by Alex (previous) (diff)

comment:2 by Alex, 2 months ago

Versions tested:

N-100906-gad2cc0e2f4-2021-02-31 is okay
BtbN N-111869-g7aa71ab5c0-2023-08-31 is okay
BtbN N-112565-g55f28eb627-2023-10-31 is okay
BtbN N-112876-ga30adf9f96-2023-11-30 is okay
2023-12-11-git-1439784ff0-essentials_build-www.gyan.dev is okay
6.1.1-full_build-www.gyan.dev (marked as 2023-12-31) is also okay

that last okay version taken from here

2023-12-14-git-5256b2fbe6-essentials_build-www.gyan.dev buggy
2023-12-23-git-f5f414d9c4-essentials_build-www.gyan.dev buggy
BtbN N-113150-ge063c1d079-2023-12-31 buggy
BtbN N-114545-g199c479b9a-2024-03-31 buggy
7.0.2-essentials_build-www.gyan.dev buggy
2024-10-27-git-bb57b78013-essentials_build-www-gyan-dev buggy

So the blame commit happened in last days of December 2023.

Well, seems New Year’s Eve commits are even more disastrous than Friday afternoon ones..

Version 3, edited 2 months ago by Alex (previous) (next) (diff)

comment:3 by MasterQuestionable, 2 months ago

Cc: MasterQuestionable added
Component: undeterminedavfilter
Keywords: filtering removed
Summary: Processing is x40 times slower in latest builds (tpad filter)"tpad" speed regression (40x?)

͏    Would you further guess which commit is the cause..?
͏    https://github.com/FFmpeg/FFmpeg/commits?since=2023-12-12&until=2023-12-12

Last edited 2 months ago by MasterQuestionable (previous) (diff)

comment:4 by Gyan, 2 months ago

The offending commit is likely d119ae2fd82a494d9430ff4d4fc262961a68c598 or from the few just before it.

You can find my proximate builds for triangulating on this page: https://github.com/GyanD/codexffmpeg/releases?page=11

comment:5 by Alex, 2 months ago

Thanks, Gyan. It is indeed between 2023-12-11-git-1439784ff0 (render speed = x200) and 2023-12-14-git-5256b2fbe6 (render speed = x6)

comment:6 by MasterQuestionable, 2 months ago

Summary: "tpad" speed regression (40x?)"tpad" speed regression (~ 40x) caused by threading change?
Note: See TracTickets for help on using tickets.