Opened 5 years ago
Closed 2 years ago
#8694 closed enhancement (wontfix)
FFV1 decoding needs a huge number of threads for optimal performance
Reported by: | zorr | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | git-master | Keywords: | ffv1 |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
I noticed that when decoding FFV1 (especially version 1) you can get much higher performance by increasing the number of decoding threads to a much larger value than the default (and recommended) number. On my system (Ryzen 3900X) the default is 16 threads. Using a 8-bit ffv1 v1 SD (720x576) source 16 threads gives 163 fps but 384 threads gives 1181 fps (7.2x speed-up). Other lossless codecs (huffyuv, magicyuv, utvideo) don't behave this way - the best performance is achieved with 48 threads, but using 24 threads is almost as good and it makes sense because that's the number logical cores on the test machine.
I ran the tests using the null encoder and without audio. The test source is over 30 minutes (44058 frames). I measured the wall clock time and calculated the fps, took the best of three runs. The test script was (just varying the -threads parameter)
ffmpeg -threads 384 -i src.avi -an -f null -
More detailed results below:
ffv1 v1, null encoder threads time (ms) fps 16 269650 163 24 216301 204 48 130619 337 96 72483 608 128 57245 770 192 46769 942 256 38337 1149 384 37304 1181 512 37352 1180 768 37458 1176
I also ran a more real-world scenario of converting the source to huffyuv. In this case best performance was achieved with 512 threads but 256 is almost as good. Detailed results below.
ffv1 v1 -> huffyuv threads time (ms) fps 16 279524 158 24 224079 197 48 133244 331 96 75631 583 128 60817 724 192 49113 897 256 41690 1057 384 41644 1058 512 41628 1058 768 41722 1056
FFV1 v3 doesn't need quite as many threads, the optimal was 128 threads (and even 96 is almost as good).
ffv1 v3 null encoder threads time (ms) fps 16 91734 480 24 72105 611 48 50835 867 64 40670 1083 80 39819 1106 96 37766 1167 128 37621 1171 192 37661 1170
And here are the results for utvideo, magicyuv and huffyuv.
utvideo, null encoder threads time (ms) fps 6 19033 2315 8 14329 3075 12 9785 4503 16 7703 5720 24 5463 8065 48 5436 8105 96 5497 8015 magicyuv, null encoder threads time (ms) fps 6 30525 1443 8 22947 1920 12 15902 2771 16 12687 3473 24 8956 4919 48 8923 4938 96 8944 4926 huffyuv, null encoder threads time (ms) fps 6 22630 1947 8 17048 2584 12 12210 3608 16 10034 4391 24 7214 6107 48 7189 6129 96 7263 6066
These benchmarks were run with the git build 20200525-6268034 (May 25, 2020 10:44). I have also tested version 4.2.2 and version 3.4.2. The performance is very similar in all of them. User furq on #ffmpeg channel also confirmed that on his Ryzen 2600 (6 cores, 12 logical cores) the best performance was with 128 threads.
I made a couple of charts to better visualize the scaling behaviour of the codecs, see here: https://i.postimg.cc/VNTxgWdw/ffv1-performance.png.
Whenever more than 16 threads are requested, ffmpeg displays a warning "Using a thread count greater than 16 is not recommended." When I asked about this on #ffmpeg IRC channel users furq and Compn were able to find out that the warning message is probably related to H.264 slice threading which seems to be buggy with more than 16 threads https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread_internal.h#L24-L26. The actual warning message code is here https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread.c#L64-L67. I have also confirmed that there are no errors on the resulting video even when using 512 threads to decode ffv1, the hashes are equal.
So I think one way to improve things would be to customize the warning message based on the used codec. Perhaps even adjusting the default number of threads based on the codec and the number of available cores. Users are probably not aware that adjusting the number of threads a 7-fold speed-up is possible.
And I think it's worth taking a look at why ffv1 needs so many threads in the first place. Perhaps it is by design but it could also be a symptom of a hidden design flaw or a simple coding error.
Change History (3)
comment:1 by , 4 years ago
Keywords: | decoding performance removed |
---|
comment:2 by , 3 years ago
comment:3 by , 2 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Is this still reproducible?