Changes between Version 31 and Version 32 of CompilationGuide

Dec 30, 2015, 3:00:02 AM (5 years ago)

add performance tips (initial entry)


  • CompilationGuide

    v31 v32  
    2828* [[CompilationGuide/Haiku|How to compile FFmpeg for Haiku]]
     30== Performance Tips ==
     32There are numerous avenues to extract maximum performance out of FFmpeg when it is built from source. The following list describes some of them:
     34* If using {{{GCC}}}, consider adding {{{-ftree-vectorize}}} to {{{--extra-cflags}}}. Most recent versions of {{{GCC}}} do not miscompile FFmpeg with the auto-vectorizer enabled, and can easily reap a general 1-2% increase in performance from a FATE run ({{{make fate-rsync; make fate}}}), with gains varying across the codebase and compiler version. However, the project does not currently maintain a list of compiler versions under which the vectorizer works correctly, since even recent releases like {{{4.8.0}}} had problems. Therefore, {{{configure}}} by default disables the auto-vectorizer on {{{GCC}}}, and it must be enabled by the user explicitly if desired, such as via the method outlined above. It is highly advised to test at least a FATE run to ensure that things work correctly when the auto-vectorizer is turned on.
     36* If using {{{GCC/Clang}}}, consider adding {{{-march=native}}} to {{{--extra-cflags}}} to make slightly better use of your hardware. Alternatively, for a more general solution, examine the {{{--arch}}} and {{{--cpu}}} options. Gains are variable, and usually quite small. However, this is usually even more safe than the above, and is thus listed here.
     38* Depending on your use case, {{{--enable-hardcoded-tables}}} may be a useful option. It results in an increase of approximately 15% in the size of {{{libavcodec}}}, the main library impacted by this change. It enables savings in table generation time, done once at codec initialization, since by hardcoding the tables, they do not need to be computed at runtime. However, the savings are often negligible (~100k cycles is a typical number) especially when amortized over the entire encoding/decoding operation. By default, this is not enabled. Improvements are being made to the runtime initialization, and so over time, this option will have an impact on fewer and fewer codecs.
     40* Other options may be found by examining {{{./configure --help}}}.
    3042= Guides for developers =