ICL and HAVE_INLINE_ASM
|Reported by:||Andrey||Owned by:|
|Blocking:||Reproduced by developer:||no|
|Analyzed by developer:||no|
Description (last modified by )
I have old version of ffmpeg (0.8) with was compiled with ICL compiler manually. Manually means what only required c-files (for 2-3 required for me codecs) was included in my icproj. Now I see that ffmpeg has native support for icl (through --toolchain=icl). So I have compiled last ffmpeg (2.0.1) with standard instructions. Then I have notice that encoding to MPEG4/AV_CODEC_ID_MSMPEG4V2 with new ffmpeg take too much time. I have some tests: old ffmpeg encode test video for a 0:55 and new ffmpeg encode the same video for a 1:19 - it is slowly on 43%!!! It is unacceptable in real applications.
I make some investigation and found that problem is that some optimizations are disabled for icl by default. The main degradation is that some high optimized asm code is not included in icl. HAVE_INLINE_ASM is 0 by default. I have defined HAVE_INLINE_ASM to 1, make code compilable and linkable with some minor changes and now the test video is encoded for 0:55 min as expected.
I believe that icl build of ffmpeg have to be fully optimized by default. I have attached some example of how to change code to make it compilable with ICL and HAVE_INLINE_ASM. Please note that I have disabled some asm code as I don't need it for my set of codecs.
Typically, there are 2 kind of changes:
1) icl doesn't know non ia32 - cltd instruction. It has to be replaced with cdq instruction instead.
#ifdef __ICL #define MASK_ABS(mask, level) \ __asm__ ("cdq \n\t" \ "xorl %1, %0 \n\t" \ "subl %1, %0 \n\t" \ : "+a"(level), "=&d"(mask)) #else #define MASK_ABS(mask, level) \ __asm__ ("cltd \n\t" \ "xorl %1, %0 \n\t" \ "subl %1, %0 \n\t" \ : "+a"(level), "=&d"(mask)) #endif
#ifdef __ICL __asm__ volatile( ... "movq %N, %%mm5 \n\t" ... : "m"(var) ); #else __asm__ volatile( ... "movq "MANGLE(var)", %%mm5 \n\t" ... ); #endif
I believe that icl syntax will work with gcc as well. So may be the best solution will be to rewrite code to don't use MANGLE in movq.